Removing INNER JOINs

Started by David Rowleyabout 11 years ago52 messages

dgrowleyml@gmail.com

about 11 years ago

1 attachment(s)

Hi,

Starting a new thread which continues on from
/messages/by-id/CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com

To give a brief summary for any new readers:

The attached patch allows for INNER JOINed relations to be removed from the
plan,
providing none of the columns are used for anything, and a foreign key
exists which
proves that a record must exist in the table being removed which matches
the join
condition:

Example:

test=# create table b (id int primary key);
CREATE TABLE
test=# create table a (id int primary key, b_id int not null references
b(id));
CREATE TABLE
test=# explain (costs off) select a.* from a inner join b on a.b_id = b.id;
QUERY PLAN
---------------
Seq Scan on a

This has worked for a few years now for LEFT JOINs, so this patch is just
extending
the joins types that can be removed.

This optimisation should prove to be quite useful for views in which a
subset of its
columns are queried.

The attached is an updated patch which fixes a conflict from a recent
commit and
also fixes a bug where join removals did not properly work for PREPAREd
statements.

I'm looking for a bit of feedback around the method I'm using to prune the
redundant
plan nodes out of the plan tree at executor startup. Particularly around
not stripping
the Sort nodes out from below a merge join, even if the sort order is no
longer
required due to the merge join node being removed. This potentially could
leave
the plan suboptimal when compared to a plan that the planner could generate
when the removed relation was never asked for in the first place.

There are also other cases such as MergeJoins performing btree index scans
in order to obtain ordered results for a MergeJoin that would be better
executed
as a SeqScan when the MergeJoin can be removed.

Perhaps some costs could be adjusted at planning time when there's a
possibility
that joins could be removed at execution time, although I'm not quite sure
about
this as it risks generating a poor plan in the case when the joins cannot
be removed.

I currently can't see much of a way around these cases, but in both cases
removing
the join should prove to be a win, though just perhaps not with the most
optimal of
plans.

There are some more details around the reasons behind doing this weird
executor
startup plan pruning around here:

/messages/by-id/20141006145957.GA20577@awork2.anarazel.de

Comments are most welcome

Regards

David Rowley

Attachments:

inner_join_removals_2014-11-29_be69869.patchapplication/octet-stream; name=inner_join_removals_2014-11-29_be69869.patchDownload

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ebccfea..ea26615 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3889,6 +3889,17 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 	return all_fired;
 }
 
+/* ----------
+ * AfterTriggerQueueIsEmpty()
+ *
+ *	True if there are no pending triggers in the queue.
+ * ----------
+ */
+bool
+AfterTriggerQueueIsEmpty(void)
+{
+	return (afterTriggers.query_depth == -1 && afterTriggers.events.head == NULL);
+}
 
 /* ----------
  * AfterTriggerBeginXact()
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 8c799d3..8d0c9be 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -884,6 +884,10 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 		i++;
 	}
 
+	if (AfterTriggerQueueIsEmpty())
+		ExecImplodePlan(&plan, estate);
+
+
 	/*
 	 * Initialize the private state information for all the nodes in the query
 	 * tree.  This opens files, allocates storage and leaves us ready to start
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d5e1273..06ac364 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -45,6 +45,7 @@
 #include "access/relscan.h"
 #include "access/transam.h"
 #include "catalog/index.h"
+#include "commands/trigger.h"
 #include "executor/execdebug.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parsetree.h"
@@ -54,6 +55,11 @@
 
 
 static bool get_last_attnums(Node *node, ProjectionInfo *projInfo);
+static bool ExecCanSkipScanNode(Plan *scannode, EState *estate);
+static Plan *ExecTryRemoveHashJoin(Plan *hashjoin, EState *estate);
+static Plan *ExecTryRemoveNestedLoop(Plan *nestedloop, EState *estate);
+static Plan *ExecTryRemoveMergeJoin(Plan *mergejoin, EState *estate);
+static void ExecResetJoinVarnos(List *tlist);
 static bool index_recheck_constraint(Relation index, Oid *constr_procs,
 						 Datum *existing_values, bool *existing_isnull,
 						 Datum *new_values);
@@ -661,6 +667,271 @@ get_last_attnums(Node *node, ProjectionInfo *projInfo)
 								  (void *) projInfo);
 }
 
+/* Returns true if the scan node may be skipped, otherwise returns false */
+static bool
+ExecCanSkipScanNode(Plan *scannode, EState *estate)
+{
+	Scan *scan = (Scan *) scannode;
+	RangeTblEntry *rte;
+
+	switch (nodeTag(scannode))
+	{
+		case T_IndexOnlyScan:
+		case T_IndexScan:
+		case T_SeqScan:
+			rte = (RangeTblEntry *) list_nth(estate->es_range_table, scan->scanrelid - 1);
+			return rte->skipJoinPossible;
+		default:
+			/* If it's not a scan node then we can't remove it */
+			return false;
+	}
+}
+
+
+static Plan *
+ExecTryRemoveHashJoin(Plan *hashjoin, EState *estate)
+{
+	bool canRemoveLeft = false;
+	bool canRemoveRight = false;
+	Plan *leftnode = hashjoin->lefttree;
+	Plan *rightnode = hashjoin->righttree;
+
+	/*
+	 * If the left node is NULL, then mode likely the node has already been
+	 * removed, in this case we can skip it
+	 */
+	if (leftnode == NULL)
+		canRemoveLeft = true;
+	else
+		canRemoveLeft = ExecCanSkipScanNode(leftnode, estate);
+
+	if (rightnode == NULL)
+		canRemoveRight = true;
+	else
+	{
+		if (nodeTag(rightnode) != T_Hash)
+		{
+			elog(ERROR, "HashJoin's righttree node should be a Hash node");
+			return hashjoin;
+		}
+
+		/* move to the node where the hash is getting tuples from */
+		rightnode = rightnode->lefttree;
+
+		/*
+		 * If this node is NULL then most likely a hashjoin has been completely
+		 * removed from below the T_Hash node. In this case we can certainly
+		 * remove the right node as there's nothing under it.
+		 */
+		if (rightnode == NULL)
+			canRemoveRight = true;
+		else
+			canRemoveRight = ExecCanSkipScanNode(rightnode, estate);
+	}
+
+	if (canRemoveLeft)
+	{
+		if (canRemoveRight)
+			return NULL; /* this join is not required at all */
+		else
+			return rightnode;
+	}
+	else
+	{
+		if (canRemoveRight)
+			return leftnode; /* only left is required */
+		else
+			return hashjoin; /* both sides are required */
+	}
+}
+
+static Plan *
+ExecTryRemoveNestedLoop(Plan *nestedloop, EState *estate)
+{
+	bool canRemoveLeft = false;
+	bool canRemoveRight = false;
+	Plan *leftnode = nestedloop->lefttree;
+	Plan *rightnode = nestedloop->righttree;
+
+	/*
+	 * If the left node is NULL, then mode likely the node has already been
+	 * removed, in this case we can skip it
+	 */
+	if (leftnode == NULL)
+		canRemoveLeft = true;
+	else
+		canRemoveLeft = ExecCanSkipScanNode(leftnode, estate);
+
+	if (rightnode == NULL)
+		canRemoveRight = true;
+	else
+		canRemoveRight = ExecCanSkipScanNode(rightnode, estate);
+
+	if (canRemoveLeft)
+	{
+		if (canRemoveRight)
+			return NULL; /* this join is not required at all */
+		else
+			return rightnode;
+	}
+	else
+	{
+		if (canRemoveRight)
+			return leftnode; /* only left is required */
+		else
+			return nestedloop; /* both sides are required */
+	}
+}
+
+static Plan *
+ExecTryRemoveMergeJoin(Plan *mergejoin, EState *estate)
+{
+	bool canRemoveLeft = false;
+	bool canRemoveRight = false;
+	Plan *leftnode = mergejoin->lefttree;
+	Plan *rightnode = mergejoin->righttree;
+
+	/*
+	 * If the left node is NULL, then mode likely the node has already been
+	 * removed, in this case we can skip it
+	 */
+	if (leftnode == NULL)
+		canRemoveLeft = true;
+	else
+	{
+		if (nodeTag(leftnode) == T_Sort)
+		{
+			/* move to the node where the merge join is getting tuples from */
+			leftnode = leftnode->lefttree;
+		}
+
+		canRemoveLeft = ExecCanSkipScanNode(leftnode, estate);
+	}
+
+	if (rightnode == NULL)
+		canRemoveRight = true;
+	else
+	{
+		if (nodeTag(rightnode) == T_Sort)
+		{
+			/* move to the node where the hash is getting tuples from */
+			rightnode = rightnode->lefttree;
+		}
+
+		/*
+		 * Check just in case the node from below the sort was already removed,
+		 * if it has then there's no point in this side of the join.
+		 */
+		if (rightnode == NULL)
+			canRemoveRight = true;
+		else
+			canRemoveRight = ExecCanSkipScanNode(rightnode, estate);
+	}
+
+	if (canRemoveLeft)
+	{
+		if (canRemoveRight)
+			return NULL; /* this join is not required at all */
+
+		/*
+		 * Right is required, skip left but maintain any sort nodes above the
+		 * scan node as sort order may be critical for the parent node.
+		 * XXX: Is there any way which we can check if the Sort order is
+		 * important to the parent?
+		 */
+		else
+			return mergejoin->righttree;
+	}
+	else
+	{
+		/* Left is required, but right is not, again keep the sort */
+		if (canRemoveRight)
+			return mergejoin->lefttree;
+		else
+			return mergejoin; /* both sides are required */
+	}
+}
+
+
+/*
+ * Reset a join node's targetlist Vars to remove the OUTER_VAR and INNER_VAR
+ * varnos
+ */
+static void
+ExecResetJoinVarnos(List *tlist)
+{
+	ListCell *lc;
+
+	foreach (lc, tlist)
+	{
+		TargetEntry *tle = (TargetEntry *) lfirst(lc);
+		Var *var = (Var *) tle->expr;
+
+		if (IsA(var, Var))
+			var->varno = var->varnoold;
+	}
+}
+
+/*
+ * Recursively process the plan tree to "move-up" nodes that sit beneath join
+ * nodes of any joins which are deemed unnecessary by the planner during the
+ * join removal process.
+ */
+void
+ExecImplodePlan(Plan **node, EState *estate)
+{
+	Plan *skippedToNode;
+	if (*node == NULL)
+		return;
+
+	/* visit each node recursively */
+	ExecImplodePlan(&(*node)->lefttree, estate);
+	ExecImplodePlan(&(*node)->righttree, estate);
+
+	switch (nodeTag(*node))
+	{
+		case T_HashJoin:
+			skippedToNode = ExecTryRemoveHashJoin(*node, estate);
+			break;
+		case T_NestLoop:
+			skippedToNode = ExecTryRemoveNestedLoop(*node, estate);
+			break;
+		case T_MergeJoin:
+			skippedToNode = ExecTryRemoveMergeJoin(*node, estate);
+			break;
+		default:
+			return;
+	}
+
+	/* both sides of join were removed, so we've nothing more to do here. */
+	if (skippedToNode == NULL)
+	{
+		*node = NULL;
+		return;
+	}
+
+	/*
+	 * If we've managed to move the node up a level, then we'd better also
+	 * replace the targetlist of the new node with that of the original node.
+	 * If we didn't do this then we might end up with columns in the result-set
+	 * that the query did not ask for.
+	 *
+	 * Also, since the original node was a join type node, the targetlist will
+	 * contain OUTER_VAR and INNER_VAR in place if the real varnos, so we must
+	 * put these back to what they should be.
+	 */
+	if (skippedToNode != *node)
+	{
+		// FIXME: What else apart from Sort should not be changed?
+		if (nodeTag(skippedToNode) != T_Sort)
+		{
+			ExecResetJoinVarnos((*node)->targetlist);
+			skippedToNode->targetlist = (*node)->targetlist;
+		}
+		*node = skippedToNode;
+	}
+}
+
 /* ----------------
  *		ExecAssignProjectionInfo
  *
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6b1bf7b..223bbc0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2025,6 +2025,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
 	COPY_SCALAR_FIELD(lateral);
 	COPY_SCALAR_FIELD(inh);
 	COPY_SCALAR_FIELD(inFromCl);
+	COPY_SCALAR_FIELD(skipJoinPossible);
 	COPY_SCALAR_FIELD(requiredPerms);
 	COPY_SCALAR_FIELD(checkAsUser);
 	COPY_BITMAPSET_FIELD(selectedCols);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index d5db71d..8e1d984 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2343,6 +2343,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
 	COMPARE_SCALAR_FIELD(lateral);
 	COMPARE_SCALAR_FIELD(inh);
 	COMPARE_SCALAR_FIELD(inFromCl);
+	COMPARE_SCALAR_FIELD(skipJoinPossible);
 	COMPARE_SCALAR_FIELD(requiredPerms);
 	COMPARE_SCALAR_FIELD(checkAsUser);
 	COMPARE_BITMAPSET_FIELD(selectedCols);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index edbd09f..783e2e9 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2453,6 +2453,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
 	WRITE_BOOL_FIELD(lateral);
 	WRITE_BOOL_FIELD(inh);
 	WRITE_BOOL_FIELD(inFromCl);
+	WRITE_BOOL_FIELD(skipJoinPossible);
 	WRITE_UINT_FIELD(requiredPerms);
 	WRITE_OID_FIELD(checkAsUser);
 	WRITE_BITMAPSET_FIELD(selectedCols);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a3efdd4..b74c0aa 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1250,6 +1250,7 @@ _readRangeTblEntry(void)
 	READ_BOOL_FIELD(lateral);
 	READ_BOOL_FIELD(inh);
 	READ_BOOL_FIELD(inFromCl);
+	READ_BOOL_FIELD(skipJoinPossible);
 	READ_UINT_FIELD(requiredPerms);
 	READ_OID_FIELD(checkAsUser);
 	READ_BITMAPSET_FIELD(selectedCols);
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 9919d27..33f8a90 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -49,8 +49,6 @@ static List *generate_join_implied_equalities_broken(PlannerInfo *root,
 										Relids outer_relids,
 										Relids nominal_inner_relids,
 										RelOptInfo *inner_rel);
-static Oid select_equality_operator(EquivalenceClass *ec,
-						 Oid lefttype, Oid righttype);
 static RestrictInfo *create_join_clause(PlannerInfo *root,
 				   EquivalenceClass *ec, Oid opno,
 				   EquivalenceMember *leftem,
@@ -1282,7 +1280,7 @@ generate_join_implied_equalities_broken(PlannerInfo *root,
  *
  * Returns InvalidOid if no operator can be found for this datatype combination
  */
-static Oid
+Oid
 select_equality_operator(EquivalenceClass *ec, Oid lefttype, Oid righttype)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index e99d416..2fb76f4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -32,13 +32,21 @@
 #include "utils/lsyscache.h"
 
 /* local functions */
-static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					  RangeTblRef *removalrtr, Relids ignoredrels);
+static bool leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool relation_is_needed(PlannerInfo *root, Relids joinrelids,
+					  RelOptInfo *rel, Relids ignoredrels);
+static bool relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+					  RelOptInfo *referencedrel, List *referencing_vars,
+					  List *index_vars, List *operator_list);
+static bool expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					  List *indexvars, List *operators);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 					  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
 
-
 /*
  * remove_useless_joins
  *		Check for relations that don't actually need to be joined at all,
@@ -46,26 +54,94 @@ static Oid	distinct_col_search(int colno, List *colnos, List *opids);
  *
  * We are passed the current joinlist and return the updated list.  Other
  * data structures that have to be updated are accessible via "root".
+ *
+ * There are 2 methods here for removing joins. Joins such as LEFT JOINs
+ * which can be proved to be needless due to lack of use of any of the joining
+ * relation's columns and the existence of a unique index on a subset of the
+ * join clause, can simply be removed from the query plan at plan time. For
+ * certain other join types we make use of foreign keys to attempt to prove the
+ * join is needless, though, for these we're unable to be certain that the join
+ * is not required at plan time, as if the plan is executed when pending
+ * foreign key triggers have not yet been fired, then the foreign key is
+ * effectively violated until these triggers have fired. Removing a join in
+ * such a case could cause a query to produce incorrect results.
+ *
+ * Instead we handle this case by marking the RangeTblEntry for the relation
+ * with a special flag which tells the executor that it's possible that joining
+ * to this relation may not be required. The executor may then check this flag
+ * and choose to skip the join based on if there are foreign key triggers
+ * pending or not.
  */
 List *
 remove_useless_joins(PlannerInfo *root, List *joinlist)
 {
 	ListCell   *lc;
+	Relids		removedrels = NULL;
 
 	/*
-	 * We are only interested in relations that are left-joined to, so we can
-	 * scan the join_info_list to find them easily.
+	 * Start by analyzing INNER JOINed relations in order to determine if any
+	 * of the relations can be ignored.
 	 */
 restart:
+	foreach(lc, joinlist)
+	{
+		RangeTblRef		*rtr = (RangeTblRef *) lfirst(lc);
+		RangeTblEntry	*rte;
+
+		if (!IsA(rtr, RangeTblRef))
+			continue;
+
+		rte = root->simple_rte_array[rtr->rtindex];
+
+		/* Don't try to remove this one again if we've already removed it */
+		if (rte->skipJoinPossible == true)
+			continue;
+
+		/* skip if the join can't be removed */
+		if (!innerjoin_is_removable(root, joinlist, rtr, removedrels))
+			continue;
+
+		/*
+		 * Since we're not actually removing the join here, we need to maintain
+		 * a list of relations that we've "removed" so when we're checking if
+		 * other relations can be removed we'll know that if the to be removed
+		 * relation is only referenced by a relation that we've already removed
+		 * that it can be safely assumed that the relation is not referenced by
+		 * any useful relation.
+		 */
+		removedrels = bms_add_member(removedrels, rtr->rtindex);
+
+		/*
+		 * Make a mark for the executor to say that it may be able to skip
+		 * joining to this relation.
+		 */
+		rte->skipJoinPossible = true;
+
+		/*
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that since we've added
+		 * this relation to the removedrels, we may now realize that other
+		 * relations can also be removed as they're only referenced by the one
+		 * that we've just marked as possibly removable).
+		 */
+		goto restart;
+	}
+
+	/* now process special joins. Currently only left joins are supported */
 	foreach(lc, root->join_info_list)
 	{
 		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
 		int			innerrelid;
 		int			nremoved;
 
-		/* Skip if not removable */
-		if (!join_is_removable(root, sjinfo))
-			continue;
+		if (sjinfo->jointype == JOIN_LEFT)
+		{
+			/* Skip if not removable */
+			if (!leftjoin_is_removable(root, sjinfo))
+				continue;
+		}
+		else
+			continue; /* we don't support this join type */
 
 		/*
 		 * Currently, join_is_removable can only succeed when the sjinfo's
@@ -91,12 +167,11 @@ restart:
 		root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
 
 		/*
-		 * Restart the scan.  This is necessary to ensure we find all
-		 * removable joins independently of ordering of the join_info_list
-		 * (note that removal of attr_needed bits may make a join appear
-		 * removable that did not before).  Also, since we just deleted the
-		 * current list cell, we'd have to have some kluge to continue the
-		 * list scan anyway.
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that removal of
+		 * attr_needed bits may make a join, inner or outer, appear removable
+		 * that did not before).   Also, since we just deleted the current list
+		 * cell, we'd have to have some kluge to continue the list scan anyway.
 		 */
 		goto restart;
 	}
@@ -136,8 +211,213 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
 }
 
 /*
- * join_is_removable
- *	  Check whether we need not perform this special join at all, because
+ * innerjoin_is_removable
+ *		True if the join to removalrtr can be removed.
+ *
+ * In order to prove a relation which is inner joined is not required we must
+ * be sure that the join would emit exactly 1 row on the join condition. This
+ * differs from the logic which is used for proving LEFT JOINs can be removed,
+ * where it's possible to just check that a unique index exists on the relation
+ * being removed which has a set of columns that is a subset of the columns
+ * seen in the join condition. If no matching row is found then left join would
+ * not remove the non-matched row from the result set. This is not the case
+ * with INNER JOINs, so here we must use foreign keys as proof that the 1 row
+ * exists before we can allow any joins to be removed.
+ */
+static bool
+innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					   RangeTblRef *removalrtr, Relids ignoredrels)
+{
+	ListCell   *lc;
+	RelOptInfo *removalrel;
+
+	removalrel = find_base_rel(root, removalrtr->rtindex);
+
+	/*
+	 * As foreign keys may only reference base rels which have unique indexes,
+	 * we needn't go any further if we're not dealing with a base rel, or if
+	 * the base rel has no unique indexes. We'd also better abort if the
+	 * rtekind is anything but a relation, as things like sub-queries may have
+	 * grouping or distinct clauses that would cause us not to be able to use
+	 * the foreign key to prove the existence of a row matching the join
+	 * condition. We also abort if the rel has no eclass joins as such a rel
+	 * could well be joined using some operator which is not an equality
+	 * operator, or the rel may not even be inner joined at all.
+	 *
+	 * Here we actually only check if the rel has any indexes, ideally we'd be
+	 * checking for unique indexes, but we could only determine that by looping
+	 * over the indexlist, and this is likely too expensive a check to be worth
+	 * it here.
+	 */
+	if (removalrel->reloptkind != RELOPT_BASEREL ||
+		removalrel->rtekind != RTE_RELATION ||
+		removalrel->has_eclass_joins == false ||
+		removalrel->indexlist == NIL)
+		return false;
+
+	/*
+	 * Currently we disallow the removal if we find any baserestrictinfo items
+	 * on the relation being removed. The reason for this is that these would
+	 * filter out rows and make it so the foreign key cannot prove that we'll
+	 * match exactly 1 row on the join condition. However, this check is
+	 * currently probably a bit overly strict as it should be possible to just
+	 * check and ensure that each Var seen in the baserestrictinfo is also
+	 * present in an eclass and if so, just translate and move the whole
+	 * baserestrictinfo over to the relation which has the foreign key to prove
+	 * that this join is not needed. e.g:
+	 * SELECT a.* FROM a INNER JOIN b ON a.b_id = b.id WHERE b.id = 1;
+	 * could become: SELECT a.* FROM a WHERE a.b_id = 1;
+	 */
+	if (removalrel->baserestrictinfo != NIL)
+		return false;
+
+	/*
+	 * Currently only eclass joins are supported, so if there are any non
+	 * eclass join quals then we'll report the join is non-removable.
+	 */
+	if (removalrel->joininfo != NIL)
+		return false;
+
+	/*
+	 * Now we'll search through each relation in the joinlist to see if we can
+	 * find a relation which has a foreign key which references removalrel on
+	 * the join condition. If we find a rel with a foreign key which matches
+	 * the join condition exactly, then we can be sure that exactly 1 row will
+	 * be matched on the join, if we also see that no Vars from the relation
+	 * are needed, then we can report the join as removable.
+	 */
+	foreach (lc, joinlist)
+	{
+		RangeTblRef	*rtr = (RangeTblRef *) lfirst(lc);
+		RelOptInfo	*rel;
+		ListCell	*lc2;
+		List		*referencing_vars;
+		List		*index_vars;
+		List		*operator_list;
+		Relids		 joinrelids;
+
+		/* we can't remove ourself, or anything other than RangeTblRefs */
+		if (rtr == removalrtr || !IsA(rtr, RangeTblRef))
+			continue;
+
+		rel = find_base_rel(root, rtr->rtindex);
+
+		/*
+		 * The only relation type that can help us is a base rel with at least
+		 * one foreign key defined, if there's no eclass joins then this rel
+		 * is not going to help us prove the removalrel is not needed.
+		 */
+		if (rel->reloptkind != RELOPT_BASEREL ||
+			rel->rtekind != RTE_RELATION ||
+			rel->has_eclass_joins == false ||
+			rel->fklist == NIL)
+			continue;
+
+		/*
+		 * Both rels have eclass joins, but do they have eclass joins to each
+		 * other? Skip this rel if it does not.
+		 */
+		if (!have_relevant_eclass_joinclause(root, rel, removalrel))
+			continue;
+
+		joinrelids = bms_union(rel->relids, removalrel->relids);
+
+		/* if any of the Vars from the relation are needed then abort */
+		if (relation_is_needed(root, joinrelids, removalrel, ignoredrels))
+			return false;
+
+		referencing_vars = NIL;
+		index_vars = NIL;
+		operator_list = NIL;
+
+		/* now populate the lists with the join condition Vars */
+		foreach(lc2, root->eq_classes)
+		{
+			EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc2);
+
+			if (list_length(ec->ec_members) <= 1)
+				continue;
+
+			if (bms_overlap(removalrel->relids, ec->ec_relids) &&
+				bms_overlap(rel->relids, ec->ec_relids))
+			{
+				ListCell *lc3;
+				Var *refvar = NULL;
+				Var *idxvar = NULL;
+
+				/*
+				 * Look at each member of the eclass and try to find a Var from
+				 * each side of the join that we can append to the list of
+				 * columns that should be checked against each foreign key.
+				 *
+				 * The following logic does not allow for join removals to take
+				 * place for foreign keys that have duplicate columns on the
+				 * referencing side of the foreign key, such as:
+				 * (a,a) references (x,y)
+				 * The use case for such a foreign key is likely small enough
+				 * that we needn't bother making this code anymore complex to
+				 * solve. If we find more than 1 Var from any of the rels then
+				 * we'll bail out.
+				 */
+				foreach (lc3, ec->ec_members)
+				{
+					EquivalenceMember *ecm = (EquivalenceMember *) lfirst(lc3);
+
+					Var *var = (Var *) ecm->em_expr;
+
+					if (!IsA(var, Var))
+						continue; /* Ignore Consts */
+
+					if (var->varno == rel->relid)
+					{
+						if (refvar != NULL)
+							return false;
+						refvar = var;
+					}
+
+					else if (var->varno == removalrel->relid)
+					{
+						if (idxvar != NULL)
+							return false;
+						idxvar = var;
+					}
+				}
+
+				if (refvar != NULL && idxvar != NULL)
+				{
+					Oid opno;
+					Oid reloid = root->simple_rte_array[refvar->varno]->relid;
+
+					if (!get_attnotnull(reloid, refvar->varattno))
+						return false;
+
+					/* grab the correct equality operator for these two vars */
+					opno = select_equality_operator(ec, refvar->vartype, idxvar->vartype);
+
+					if (!OidIsValid(opno))
+						return false;
+
+					referencing_vars = lappend(referencing_vars, refvar);
+					index_vars = lappend(index_vars, idxvar);
+					operator_list = lappend_oid(operator_list, opno);
+				}
+			}
+		}
+
+		if (referencing_vars != NULL)
+		{
+			if (relation_has_foreign_key_for(root, rel, removalrel,
+				referencing_vars, index_vars, operator_list))
+				return true; /* removalrel can be removed */
+		}
+	}
+
+	return false; /* can't remove join */
+}
+
+/*
+ * leftjoin_is_removable
+ *	  Check whether we need not perform this left join at all, because
  *	  it will just duplicate its left input.
  *
  * This is true for a left join for which the join condition cannot match
@@ -147,7 +427,7 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
  * above the join.
  */
 static bool
-join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 {
 	int			innerrelid;
 	RelOptInfo *innerrel;
@@ -155,14 +435,14 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	Relids		joinrelids;
 	List	   *clause_list = NIL;
 	ListCell   *l;
-	int			attroff;
+
+	Assert(sjinfo->jointype == JOIN_LEFT);
 
 	/*
-	 * Must be a non-delaying left join to a single baserel, else we aren't
+	 * Must be a non-delaying join to a single baserel, else we aren't
 	 * going to be able to do anything with it.
 	 */
-	if (sjinfo->jointype != JOIN_LEFT ||
-		sjinfo->delay_upper_joins)
+	if (sjinfo->delay_upper_joins)
 		return false;
 
 	if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
@@ -206,52 +486,9 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
-	/*
-	 * We can't remove the join if any inner-rel attributes are used above the
-	 * join.
-	 *
-	 * Note that this test only detects use of inner-rel attributes in higher
-	 * join conditions and the target list.  There might be such attributes in
-	 * pushed-down conditions at this join, too.  We check that case below.
-	 *
-	 * As a micro-optimization, it seems better to start with max_attr and
-	 * count down rather than starting with min_attr and counting up, on the
-	 * theory that the system attributes are somewhat less likely to be wanted
-	 * and should be tested last.
-	 */
-	for (attroff = innerrel->max_attr - innerrel->min_attr;
-		 attroff >= 0;
-		 attroff--)
-	{
-		if (!bms_is_subset(innerrel->attr_needed[attroff], joinrelids))
-			return false;
-	}
-
-	/*
-	 * Similarly check that the inner rel isn't needed by any PlaceHolderVars
-	 * that will be used above the join.  We only need to fail if such a PHV
-	 * actually references some inner-rel attributes; but the correct check
-	 * for that is relatively expensive, so we first check against ph_eval_at,
-	 * which must mention the inner rel if the PHV uses any inner-rel attrs as
-	 * non-lateral references.  Note that if the PHV's syntactic scope is just
-	 * the inner rel, we can't drop the rel even if the PHV is variable-free.
-	 */
-	foreach(l, root->placeholder_list)
-	{
-		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
-
-		if (bms_is_subset(phinfo->ph_needed, joinrelids))
-			continue;			/* PHV is not used above the join */
-		if (bms_overlap(phinfo->ph_lateral, innerrel->relids))
-			return false;		/* it references innerrel laterally */
-		if (!bms_overlap(phinfo->ph_eval_at, innerrel->relids))
-			continue;			/* it definitely doesn't reference innerrel */
-		if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids))
-			return false;		/* there isn't any other place to eval PHV */
-		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
-						innerrel->relids))
-			return false;		/* it does reference innerrel */
-	}
+	/* if the relation is referenced in the query then it cannot be removed */
+	if (relation_is_needed(root, joinrelids, innerrel, NULL))
+		return false;
 
 	/*
 	 * Search for mergejoinable clauses that constrain the inner rel against
@@ -368,6 +605,218 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	return false;
 }
 
+/*
+ * relation_is_needed
+ *		True if any of the Vars from this relation are required in the query
+ */
+static inline bool
+relation_is_needed(PlannerInfo *root, Relids joinrelids, RelOptInfo *rel, Relids ignoredrels)
+{
+	int		  attroff;
+	ListCell *l;
+
+	/*
+	 * rel is referenced if any of it's attributes are used above the join.
+	 *
+	 * Note that this test only detects use of rel's attributes in higher
+	 * join conditions and the target list.  There might be such attributes in
+	 * pushed-down conditions at this join, too.  We check that case below.
+	 *
+	 * As a micro-optimization, it seems better to start with max_attr and
+	 * count down rather than starting with min_attr and counting up, on the
+	 * theory that the system attributes are somewhat less likely to be wanted
+	 * and should be tested last.
+	 */
+	for (attroff = rel->max_attr - rel->min_attr;
+		 attroff >= 0;
+		 attroff--)
+	{
+		if (!bms_is_subset(bms_difference(rel->attr_needed[attroff], ignoredrels), joinrelids))
+			return true;
+	}
+
+	/*
+	 * Similarly check that rel isn't needed by any PlaceHolderVars that will
+	 * be used above the join.  We only need to fail if such a PHV actually
+	 * references some of rel's attributes; but the correct check for that is
+	 * relatively expensive, so we first check against ph_eval_at, which must
+	 * mention rel if the PHV uses any of-rel's attrs as non-lateral
+	 * references.  Note that if the PHV's syntactic scope is just rel, we
+	 * can't return true even if the PHV is variable-free.
+	 */
+	foreach(l, root->placeholder_list)
+	{
+		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
+
+		if (bms_is_subset(phinfo->ph_needed, joinrelids))
+			continue;			/* PHV is not used above the join */
+		if (bms_overlap(phinfo->ph_lateral, rel->relids))
+			return true;		/* it references rel laterally */
+		if (!bms_overlap(phinfo->ph_eval_at, rel->relids))
+			continue;			/* it definitely doesn't reference rel */
+		if (bms_is_subset(phinfo->ph_eval_at, rel->relids))
+			return true;		/* there isn't any other place to eval PHV */
+		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
+						rel->relids))
+			return true;		/* it does reference rel */
+	}
+
+	return false; /* it does not reference rel */
+}
+
+/*
+ * relation_has_foreign_key_for
+ *	  Checks if rel has a foreign key which references referencedrel with the
+ *	  given list of expressions.
+ *
+ *	For the match to succeed:
+ *	  referencing_vars must match the columns defined in the foreign key.
+ *	  index_vars must match the columns defined in the index for the foreign key.
+ */
+static bool
+relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+			RelOptInfo *referencedrel, List *referencing_vars,
+			List *index_vars, List *operator_list)
+{
+	ListCell *lc;
+	Oid		  refreloid;
+
+	/*
+	 * Look up the Oid of the referenced relation. We only want to look at
+	 * foreign keys on the referencing relation which reference this relation.
+	 */
+	refreloid = root->simple_rte_array[referencedrel->relid]->relid;
+
+	Assert(list_length(referencing_vars) > 0);
+	Assert(list_length(referencing_vars) == list_length(index_vars));
+	Assert(list_length(referencing_vars) == list_length(operator_list));
+
+	/*
+	 * Search through each foreign key on the referencing relation and try
+	 * to find one which references the relation in the join condition. If we
+	 * find one then we'll send the join conditions off to
+	 * expressions_match_foreign_key() to see if they match the foreign key.
+	 */
+	foreach(lc, rel->fklist)
+	{
+		ForeignKeyInfo *fk = (ForeignKeyInfo *) lfirst(lc);
+
+		if (fk->confrelid == refreloid)
+		{
+			if (expressions_match_foreign_key(fk, referencing_vars,
+				index_vars, operator_list))
+				return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * expressions_match_foreign_key
+ *		True if the given fkvars, indexvars and operators will match
+ *		exactly 1 record in the referenced relation of the foreign key.
+ *
+ * Note: This function expects fkvars and indexvars to only contain Var types.
+ *		 Expression indexes are not supported by foreign keys.
+ */
+static bool
+expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					List *indexvars, List *operators)
+{
+	ListCell  *lc;
+	ListCell  *lc2;
+	ListCell  *lc3;
+	Bitmapset *allitems;
+	Bitmapset *matcheditems;
+	int		   lstidx;
+	int		   col;
+
+	Assert(list_length(fkvars) == list_length(indexvars));
+	Assert(list_length(fkvars) == list_length(operators));
+
+	/*
+	 * Fast path out if there's not enough conditions to match each column in
+	 * the foreign key. Note that we cannot check that the number of
+	 * expressions are equal here since it would cause any expressions which
+	 * are duplicated not to match.
+	 */
+	if (list_length(fkvars) < fk->conncols)
+		return false;
+
+	/*
+	 * We need to ensure that each foreign key column can be matched to a list
+	 * item, and we need to ensure that each list item can be matched to a
+	 * foreign key column. We do this by looping over each foreign key column
+	 * and checking that we can find an item in the list which matches the
+	 * current column, however this method does not allow us to ensure that no
+	 * additional items exist in the list. We could solve that by performing
+	 * another loop over each list item and check that it matches a foreign key
+	 * column, but that's a bit wasteful. Instead we'll use 2 bitmapsets, one
+	 * to store the 0 based index of each list item, and with the other we'll
+	 * store each list index that we've managed to match. After we're done
+	 * matching we'll just make sure that both bitmapsets are equal.
+	 */
+	allitems = NULL;
+	matcheditems = NULL;
+
+	/*
+	 * Build a bitmapset which contains each 1 based list index. It seems more
+	 * efficient to do this in reverse so that we allocate enough memory for
+	 * the bitmapset on first loop rather than reallocating each time we find
+	 * we need a bit more space.
+	 */
+	for (lstidx = list_length(fkvars) - 1; lstidx >= 0; lstidx--)
+		allitems = bms_add_member(allitems, lstidx);
+
+	for (col = 0; col < fk->conncols; col++)
+	{
+		bool  matched = false;
+
+		lstidx = 0;
+
+		forthree(lc, fkvars, lc2, indexvars, lc3, operators)
+		{
+			Var *expr = (Var *) lfirst(lc);
+			Var *idxexpr = (Var *) lfirst(lc2);
+			Oid  opr = lfirst_oid(lc3);
+
+			Assert(IsA(expr, Var));
+			Assert(IsA(idxexpr, Var));
+
+			/* Does this join qual match up to the current fkey column? */
+			if (fk->conkey[col] == expr->varattno &&
+				fk->confkey[col] == idxexpr->varattno &&
+				equality_ops_are_compatible(opr, fk->conpfeqop[col]))
+			{
+				matched = true;
+
+				/* mark this list item as matched */
+				matcheditems = bms_add_member(matcheditems, lstidx);
+
+				/*
+				 * Don't break here as there may be duplicate expressions
+				 * that we also need to match against.
+				 */
+			}
+			lstidx++;
+		}
+
+		/* punt if there's no match. */
+		if (!matched)
+			return false;
+	}
+
+	/*
+	 * Ensure that we managed to match every item in the list to a foreign key
+	 * column.
+	 */
+	if (!bms_equal(allitems, matcheditems))
+		return false;
+
+	return true; /* matched */
+}
+
 
 /*
  * Remove the target relid from the planner's data structures, having
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f752ecc..a783f14 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3712,6 +3712,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
 	rte->lateral = false;
 	rte->inh = false;
 	rte->inFromCl = true;
+	rte->skipJoinPossible = false;
 	query->rtable = list_make1(rte);
 
 	/* Set up RTE/RelOptInfo arrays */
diff --git a/src/backend/optimizer/prep/prepsecurity.c b/src/backend/optimizer/prep/prepsecurity.c
index b625b5c..74a0dca 100644
--- a/src/backend/optimizer/prep/prepsecurity.c
+++ b/src/backend/optimizer/prep/prepsecurity.c
@@ -311,6 +311,7 @@ expand_security_qual(PlannerInfo *root, List *tlist, int rt_index,
 			subrte->security_barrier = rte->security_barrier;
 			subrte->eref = copyObject(rte->eref);
 			subrte->inFromCl = true;
+			subrte->skipJoinPossible = false;
 			subquery->rtable = list_make1(subrte);
 
 			subrtr = makeNode(RangeTblRef);
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b2becfa..fea198e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -25,7 +25,9 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_constraint.h"
 #include "catalog/heap.h"
+#include "catalog/pg_type.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -38,6 +40,7 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
@@ -89,6 +92,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *fkinfos = NIL;
+	Relation	fkeyRel;
+	Relation	fkeyRelIdx;
+	ScanKeyData fkeyScankey;
+	SysScanDesc fkeyScan;
+	HeapTuple	tuple;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -384,6 +393,111 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	heap_close(relation, NoLock);
 
+	/* load foreign key constraints */
+	ScanKeyInit(&fkeyScankey,
+				Anum_pg_constraint_conrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relationObjectId));
+
+	fkeyRel = heap_open(ConstraintRelationId, AccessShareLock);
+	fkeyRelIdx = index_open(ConstraintRelidIndexId, AccessShareLock);
+	fkeyScan = systable_beginscan_ordered(fkeyRel, fkeyRelIdx, NULL, 1, &fkeyScankey);
+
+	while ((tuple = systable_getnext_ordered(fkeyScan, ForwardScanDirection)) != NULL)
+	{
+		Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tuple);
+		ForeignKeyInfo *fkinfo;
+		Datum		adatum;
+		bool		isNull;
+		ArrayType  *arr;
+		int			nelements;
+
+		/* skip if not a foreign key */
+		if (con->contype != CONSTRAINT_FOREIGN)
+			continue;
+
+		/* we're not interested unless the fkey has been validated */
+		if (!con->convalidated)
+			continue;
+
+		fkinfo = (ForeignKeyInfo *) palloc(sizeof(ForeignKeyInfo));
+		fkinfo->conindid = con->conindid;
+		fkinfo->confrelid = con->confrelid;
+		fkinfo->convalidated = con->convalidated;
+		fkinfo->conrelid = con->conrelid;
+		fkinfo->confupdtype = con->confupdtype;
+		fkinfo->confdeltype = con->confdeltype;
+		fkinfo->confmatchtype = con->confmatchtype;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "conkey is not a 1-D smallint array");
+
+		fkinfo->conkey = (int16 *) ARR_DATA_PTR(arr);
+		fkinfo->conncols = nelements;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_confkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null confkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "confkey is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of confkey elements does not equal conkey elements");
+
+		fkinfo->confkey = (int16 *) ARR_DATA_PTR(arr);
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conpfeqop,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conpfeqop for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != OIDOID)
+			elog(ERROR, "conpfeqop is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of conpfeqop elements does not equal conkey elements");
+
+		fkinfo->conpfeqop = (Oid *) ARR_DATA_PTR(arr);
+
+		fkinfos = lappend(fkinfos, fkinfo);
+	}
+
+	rel->fklist = fkinfos;
+	systable_endscan_ordered(fkeyScan);
+	index_close(fkeyRelIdx, AccessShareLock);
+	heap_close(fkeyRel, AccessShareLock);
+
 	/*
 	 * Allow a plugin to editorialize on the info we obtained from the
 	 * catalogs.  Actions might include altering the assumed relation size,
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4c76f54..58d80bb 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -115,6 +115,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->lateral_relids = NULL;
 	rel->lateral_referencers = NULL;
 	rel->indexlist = NIL;
+	rel->fklist = NIL;
 	rel->pages = 0;
 	rel->tuples = 0;
 	rel->allvisfrac = 0;
@@ -377,6 +378,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->lateral_relids = NULL;
 	joinrel->lateral_referencers = NULL;
 	joinrel->indexlist = NIL;
+	joinrel->fklist = NIL;
 	joinrel->pages = 0;
 	joinrel->tuples = 0;
 	joinrel->allvisfrac = 0;
diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 478584d..cafeba9 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1048,6 +1048,7 @@ addRangeTableEntry(ParseState *pstate,
 	rte->lateral = false;
 	rte->inh = inh;
 	rte->inFromCl = inFromCl;
+	rte->skipJoinPossible = false;
 
 	rte->requiredPerms = ACL_SELECT;
 	rte->checkAsUser = InvalidOid;		/* not set-uid by default, either */
@@ -1101,6 +1102,7 @@ addRangeTableEntryForRelation(ParseState *pstate,
 	rte->lateral = false;
 	rte->inh = inh;
 	rte->inFromCl = inFromCl;
+	rte->skipJoinPossible = false;
 
 	rte->requiredPerms = ACL_SELECT;
 	rte->checkAsUser = InvalidOid;		/* not set-uid by default, either */
@@ -1179,6 +1181,7 @@ addRangeTableEntryForSubquery(ParseState *pstate,
 	rte->lateral = lateral;
 	rte->inh = false;			/* never true for subqueries */
 	rte->inFromCl = inFromCl;
+	rte->skipJoinPossible = false;
 
 	rte->requiredPerms = 0;
 	rte->checkAsUser = InvalidOid;
@@ -1433,6 +1436,7 @@ addRangeTableEntryForFunction(ParseState *pstate,
 	rte->lateral = lateral;
 	rte->inh = false;			/* never true for functions */
 	rte->inFromCl = inFromCl;
+	rte->skipJoinPossible = false;
 
 	rte->requiredPerms = 0;
 	rte->checkAsUser = InvalidOid;
@@ -1505,6 +1509,7 @@ addRangeTableEntryForValues(ParseState *pstate,
 	rte->lateral = lateral;
 	rte->inh = false;			/* never true for values RTEs */
 	rte->inFromCl = inFromCl;
+	rte->skipJoinPossible = false;
 
 	rte->requiredPerms = 0;
 	rte->checkAsUser = InvalidOid;
@@ -1573,6 +1578,7 @@ addRangeTableEntryForJoin(ParseState *pstate,
 	rte->lateral = false;
 	rte->inh = false;			/* never true for joins */
 	rte->inFromCl = inFromCl;
+	rte->skipJoinPossible = false;
 
 	rte->requiredPerms = 0;
 	rte->checkAsUser = InvalidOid;
@@ -1673,6 +1679,7 @@ addRangeTableEntryForCTE(ParseState *pstate,
 	rte->lateral = false;
 	rte->inh = false;			/* never true for subqueries */
 	rte->inFromCl = inFromCl;
+	rte->skipJoinPossible = false;
 
 	rte->requiredPerms = 0;
 	rte->checkAsUser = InvalidOid;
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 24ade6c..11ab914 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -843,6 +843,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
 		oldrte->lateral = false;
 		oldrte->inh = false;
 		oldrte->inFromCl = true;
+		oldrte->skipJoinPossible = false;
 
 		newrte = makeNode(RangeTblEntry);
 		newrte->rtekind = RTE_RELATION;
@@ -853,6 +854,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
 		newrte->lateral = false;
 		newrte->inh = false;
 		newrte->inFromCl = true;
+		newrte->skipJoinPossible = false;
 
 		/* Build two-element rtable */
 		memset(&dpns, 0, sizeof(dpns));
@@ -2508,6 +2510,7 @@ deparse_context_for(const char *aliasname, Oid relid)
 	rte->lateral = false;
 	rte->inh = false;
 	rte->inFromCl = true;
+	rte->skipJoinPossible = false;
 
 	/* Build one-element rtable */
 	dpns->rtable = list_make1(rte);
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 73138e0..db0f90a 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -916,6 +916,33 @@ get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 	ReleaseSysCache(tp);
 }
 
+/*
+ * get_attnotnull
+ *
+ *		Given the relation id and the attribute number,
+ *		return the "attnotnull" field from the attribute relation.
+ */
+bool
+get_attnotnull(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache2(ATTNUM,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+		bool		result;
+
+		result = att_tup->attnotnull;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return false;
+}
+
 /*				---------- COLLATION CACHE ----------					 */
 
 /*
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index d0b0356..34a75e4 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -181,6 +181,7 @@ extern void ExecBSTruncateTriggers(EState *estate,
 extern void ExecASTruncateTriggers(EState *estate,
 					   ResultRelInfo *relinfo);
 
+extern bool AfterTriggerQueueIsEmpty(void);
 extern void AfterTriggerBeginXact(void);
 extern void AfterTriggerBeginQuery(void);
 extern void AfterTriggerEndQuery(EState *estate);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed3ae39..1c2ef45 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -340,6 +340,7 @@ extern ProjectionInfo *ExecBuildProjectionInfo(List *targetList,
 						ExprContext *econtext,
 						TupleTableSlot *slot,
 						TupleDesc inputDesc);
+extern void ExecImplodePlan(Plan **planstate, EState *estate);
 extern void ExecAssignProjectionInfo(PlanState *planstate,
 						 TupleDesc inputDesc);
 extern void ExecFreeExprContext(PlanState *planstate);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 255415d..152f1bb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -813,6 +813,8 @@ typedef struct RangeTblEntry
 	bool		lateral;		/* subquery, function, or values is LATERAL? */
 	bool		inh;			/* inheritance requested? */
 	bool		inFromCl;		/* present in FROM clause? */
+	bool		skipJoinPossible; /* it may be possible to not bother joining
+								   * this relation at all */
 	AclMode		requiredPerms;	/* bitmask of required access permissions */
 	Oid			checkAsUser;	/* if valid, check access as this role */
 	Bitmapset  *selectedCols;	/* columns needing SELECT permission */
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7116496..56918ab 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -359,6 +359,8 @@ typedef struct PlannerInfo
  *		lateral_referencers - relids of rels that reference this one laterally
  *		indexlist - list of IndexOptInfo nodes for relation's indexes
  *					(always NIL if it's not a table)
+ *		fklist - list of ForeignKeyInfo's for relation's foreign key
+ *					constraints. (always NIL if it's not a table)
  *		pages - number of disk pages in relation (zero if not a table)
  *		tuples - number of tuples in relation (not considering restrictions)
  *		allvisfrac - fraction of disk pages that are marked all-visible
@@ -452,6 +454,7 @@ typedef struct RelOptInfo
 	Relids		lateral_relids; /* minimum parameterization of rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *fklist;			/* list of ForeignKeyInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -542,6 +545,51 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * ForeignKeyInfo
+ *		Used to store pg_constraint records for foreign key constraints for use
+ *		by the planner.
+ *
+ *		conindid - The index which supports the foreign key
+ *
+ *		confrelid - The relation that is referenced by this foreign key
+ *
+ *		convalidated - True if the foreign key has been validated.
+ *
+ *		conrelid - The Oid of the relation that the foreign key belongs to
+ *
+ *		confupdtype - ON UPDATE action for when the referenced table is updated
+ *
+ *		confdeltype - ON DELETE action, controls what to do when a record is
+ *					deleted from the referenced table.
+ *
+ *		confmatchtype - foreign key match type, e.g MATCH FULL, MATCH PARTIAL
+ *
+ *		conncols - Number of columns defined in the foreign key
+ *
+ *		conkey - An array of conncols elements to store the varattno of the
+ *					columns on the referencing side of the foreign key
+ *
+ *		confkey - An array of conncols elements to store the varattno of the
+ *					columns on the referenced side of the foreign key
+ *
+ *		conpfeqop - An array of conncols elements to store the operators for
+ *					PK = FK comparisons
+ */
+typedef struct ForeignKeyInfo
+{
+	Oid			conindid;		/* index supporting this constraint */
+	Oid			confrelid;		/* relation referenced by foreign key */
+	bool		convalidated;	/* constraint has been validated? */
+	Oid			conrelid;		/* relation this constraint constrains */
+	char		confupdtype;	/* foreign key's ON UPDATE action */
+	char		confdeltype;	/* foreign key's ON DELETE action */
+	char		confmatchtype;	/* foreign key's match type */
+	int			conncols;		/* number of columns references */
+	int16	   *conkey;			/* Columns of conrelid that the constraint applies to */
+	int16	   *confkey;		/* columns of confrelid that foreign key references */
+	Oid		   *conpfeqop;		/* Operator list for comparing PK to FK */
+} ForeignKeyInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index afa5f9b..6dada00 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -119,6 +119,8 @@ extern List *generate_join_implied_equalities(PlannerInfo *root,
 								 Relids join_relids,
 								 Relids outer_relids,
 								 RelOptInfo *inner_rel);
+extern Oid select_equality_operator(EquivalenceClass *ec, Oid lefttype,
+								 Oid righttype);
 extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
 extern void add_child_rel_equivalences(PlannerInfo *root,
 						   AppendRelInfo *appinfo,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 1a556f8..1cb9970 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -68,6 +68,7 @@ extern Oid	get_atttype(Oid relid, AttrNumber attnum);
 extern int32 get_atttypmod(Oid relid, AttrNumber attnum);
 extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 					  Oid *typid, int32 *typmod, Oid *collid);
+extern bool get_attnotnull(Oid relid, AttrNumber attnum);
 extern char *get_collation_name(Oid colloid);
 extern char *get_constraint_name(Oid conoid);
 extern Oid	get_opclass_family(Oid opclass);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2501184..7d44739 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3276,6 +3276,197 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 (1 row)
 
 rollback;
+begin work;
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+ id | b_id 
+----+------
+  3 |    2
+(1 row)
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (val = 10)
+(6 rows)
+
+-- check merge join nodes are removed properly
+set enable_hashjoin = off;
+-- this should remove joins to b and c.
+explain (costs off)
+select count(*) from a inner join b on a.b_id = b.id left join c on a.id = c.id;
+        QUERY PLAN         
+---------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: a.b_id
+         ->  Seq Scan on a
+(4 rows)
+
+-- this should remove joins to b and c, however it b will only be removed on
+-- 2nd attempt after c is removed by the left join removal code.
+explain (costs off)
+select count(*) from a inner join b on a.b_id = b.id left join c on b.id = c.id;
+        QUERY PLAN         
+---------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: a.b_id
+         ->  Seq Scan on a
+(4 rows)
+
+set enable_hashjoin = on;
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (id = val)
+(6 rows)
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+         QUERY PLAN         
+----------------------------
+ Hash Join
+   Hash Cond: (a.id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+          QUERY PLAN          
+------------------------------
+ Hash Right Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+        QUERY PLAN         
+---------------------------
+ Nested Loop
+   ->  Seq Scan on c
+   ->  Materialize
+         ->  Seq Scan on a
+(4 rows)
+
+set constraints b_c_id_fkey deferred;
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+explain (costs off)
+execute ab;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+explain (costs off)
+execute ab;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+rollback;
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 718e1d9..8591050 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -977,6 +977,103 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 
 rollback;
 
+begin work;
+
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+
+-- check merge join nodes are removed properly
+set enable_hashjoin = off;
+
+-- this should remove joins to b and c.
+explain (costs off)
+select count(*) from a inner join b on a.b_id = b.id left join c on a.id = c.id;
+
+-- this should remove joins to b and c, however it b will only be removed on
+-- 2nd attempt after c is removed by the left join removal code.
+explain (costs off)
+select count(*) from a inner join b on a.b_id = b.id left join c on b.id = c.id;
+
+set enable_hashjoin = on;
+
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+
+set constraints b_c_id_fkey deferred;
+
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+rollback;
+
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);

Mart Kelder

mart@kelder31.nl

about 11 years ago

In reply to: David Rowley (#1)

Re: Removing INNER JOINs

Hi David (and others),

David Rowley wrote:

Hi,

Starting a new thread which continues on from
/messages/by-id/CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com

To give a brief summary for any new readers:

The attached patch allows for INNER JOINed relations to be removed from
the plan, providing none of the columns are used for anything, and a
foreign key exists which proves that a record must exist in the table
being removed which matches the join condition:

I'm looking for a bit of feedback around the method I'm using to prune the
redundant plan nodes out of the plan tree at executor startup.
Particularly around not stripping the Sort nodes out from below a merge
join, even if the sort order is no longer required due to the merge join
node being removed. This potentially could leave the plan suboptimal when
compared to a plan that the planner could generate when the removed
relation was never asked for in the first place.

I did read this patch (and the previous patch about removing SEMI-joins)
with great interest. I don't know the code well enough to say much about the
patch itself, but I hope to have some usefull ideas about the the global
process.

I think performance can be greatly improved if the planner is able to use
information based on the current data. I think these patches are just two
examples of where assumptions during planning are usefull. I think there are
more possibilities for this kind of assumpions (for example unique
constraints, empty tables).

There are some more details around the reasons behind doing this weird
executor startup plan pruning around here:

/messages/by-id/20141006145957.GA20577@awork2.anarazel.de

The problem here is that assumpions done during planning might not hold
during execution. That is why you placed the final decision about removing a
join in the executor.

If a plan is made, you know under which assumptions are made in the final
plan. In this case, the assumption is that a foreign key is still valid. In
general, there are a lot more assumptions, such as the still existing of an
index or the still existing of columns. There also are soft assumptions,
assuming that the used statistics are still reasonable.

My suggestion is to check the assumptions at the start of executor. If they
still hold, you can just execute the plan as it is.

If one or more assumptions doesn't hold, there are a couple of things you
might do:
* Make a new plan. The plan is certain to match all conditions because at
that time, a snapshot is already taken.
* Check the assumption. This can be a costly operation with no guarantee of
success.
* Change the existing plan to not rely on the failed assumption.
* Use an already stored alternate plan (generate during the initial plan).

You currently change the plan in executer code. I suggest to go back to the
planner if the assumpion doesn't hold. The planner can then decide to change
the plan. The planner can also conclude to fully replan if there are reasons
for it.

If the planner knows that it needs to replan if the assumption will not hold
during execution, the cost of replanning multiplied by the chance of the
assumption not holding during exeuction should be part of the decision to
deliver a plan with an assumpion in the first place.

There are also other cases such as MergeJoins performing btree index scans
in order to obtain ordered results for a MergeJoin that would be better
executed as a SeqScan when the MergeJoin can be removed.

Perhaps some costs could be adjusted at planning time when there's a
possibility that joins could be removed at execution time, although I'm
not quite sure about this as it risks generating a poor plan in the case
when the joins cannot be removed.

Maybe this is a case where you are better off replanning if the assumption
doesn't hold instead of changing the generated exeuction plan. In that case
you can remove the join before the path is made.

Comments are most welcome

Regards

David Rowley

Regards,

Mart

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Mart Kelder (#2)

Re: Removing INNER JOINs

On 30 November 2014 at 23:19, Mart Kelder <mart@kelder31.nl> wrote:

I think performance can be greatly improved if the planner is able to use
information based on the current data. I think these patches are just two
examples of where assumptions during planning are usefull. I think there
are
more possibilities for this kind of assumpions (for example unique
constraints, empty tables).

The problem here is that assumpions done during planning might not hold
during execution. That is why you placed the final decision about removing
a
join in the executor.

If a plan is made, you know under which assumptions are made in the final
plan. In this case, the assumption is that a foreign key is still valid. In
general, there are a lot more assumptions, such as the still existing of an
index or the still existing of columns. There also are soft assumptions,
assuming that the used statistics are still reasonable.

Hi Mart,

That's an interesting idea. Though I think it would be much harder to
decide if it's a good idea to go off and replan for things like empty
tables as that's not known at executor startup, and may only be discovered
99% of the way through the plan execution, in that case going off and
replanning and starting execution all over again might throw away too much
hard work.

It does seem like a good idea for things that could be known at executor
start-up, I guess this would likely include LEFT JOIN removals using
deferrable unique indexes... Currently these indexes are ignored by the
current join removal code as they mightn't be unique until the transaction
finishes.

I'm imagining this being implemented by passing the planner a set of flags
which are assumptions that the planner is allowed to make... During the
planner's work, if it generated a plan which required this assumption to be
met, then it could set this flag in the plan somewhere which would force
the executor to check this at executor init. If the executor found any
required flag's conditions to be not met, then the executor would request a
new plan passing all the original flags, minus the ones that the conditions
have been broken on.

I see this is quite a fundamental change to how things currently work and
it could cause planning to take place during the execution of PREPAREd
statements, which might not impress people too much, but it would certainly
fix the weird anomalies that I'm currently facing by trimming the plan at
executor startup. e.g left over Sort nodes after a MergeJoin was removed.

It would be interesting to hear Tom's opinion on this.

Regards

David Rowley

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: David Rowley (#3)

Re: Removing INNER JOINs

David Rowley <dgrowleyml@gmail.com> writes:

I see this is quite a fundamental change to how things currently work and
it could cause planning to take place during the execution of PREPAREd
statements, which might not impress people too much, but it would certainly
fix the weird anomalies that I'm currently facing by trimming the plan at
executor startup. e.g left over Sort nodes after a MergeJoin was removed.

It would be interesting to hear Tom's opinion on this.

TBH I don't like this patch at all even in its current form, let alone
a form that's several times more invasive. I do not think there is a
big enough use-case to justify such an ad-hoc and fundamentally different
way of doing things. I think it's probably buggy as can be --- one thing
that definitely is a huge bug is that it modifies the plan tree in-place,
ignoring the rule that the plan tree is read-only to the executor.
Another question is what effect this has on EXPLAIN; there's basically
no way you can avoid lying to the user about what's going to happen at
runtime.

One idea you might think about to ameliorate those two objections is two
separate plan trees underneath an AlternativeSubPlan or similar kind of
node.

At a more macro level, there's the issue of how can the planner possibly
make intelligent decisions at other levels of the join tree when it
doesn't know the cost of this join. For that matter there's nothing
particularly driving the planner to arrange the tree so that the
optimization is possible at all.

Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Tom Lane (#4)

Re: Removing INNER JOINs

On 1 December 2014 at 06:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <dgrowleyml@gmail.com> writes:

I see this is quite a fundamental change to how things currently work and
it could cause planning to take place during the execution of PREPAREd
statements, which might not impress people too much, but it would

certainly

fix the weird anomalies that I'm currently facing by trimming the plan at
executor startup. e.g left over Sort nodes after a MergeJoin was removed.

It would be interesting to hear Tom's opinion on this.

Another question is what effect this has on EXPLAIN; there's basically
no way you can avoid lying to the user about what's going to happen at
runtime.

One of us must be missing something here. As far as I see it, there are no
lies told, the EXPLAIN shows exactly the plan that will be executed. All of
the regression tests I've added rely on this.

Regards

David Rowley

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#4)

Re: Removing INNER JOINs

On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.

This optimization applies to a tremendous number of real-world cases,
and we really need to have it. This was a huge problem for me in my
previous life as a web developer. The previous work that we did to
remove LEFT JOINs was an enormous help, but it's not enough; we need a
way to remove INNER JOINs as well.

I thought that David's original approach of doing this in the planner
was a good one. That fell down because of the possibility that
apparently-valid referential integrity constraints might not be valid
at execution time if the triggers were deferred. But frankly, that
seems like an awfully nitpicky thing for this to fall down on. Lots
of web applications are going to issue only SELECT statements that run
as as single-statement transactions, and so that issue, so troubling
in theory, will never occur in practice. That doesn't mean that we
don't need to account for it somehow to make the code safe, but any
argument that it abridges the use case significantly is, in my
opinion, not credible.

Anyway, David was undeterred by the rejection of that initial approach
and rearranged everything, based on suggestions from Andres and later
Simon, into the form it's reached now. Kudos to him for his
persistance. But your point that we might have chosen a whole
different plan if it had known that this join was cheaper is a good
one. However, that takes us right back to square one, which is to do
this at plan time. I happen to think that's probably better anyway,
but I fear we're just going around in circles here. We can either do
it at plan time and find some way of handling the fact that there
might be deferred triggers that haven't fired yet; or we can do it at
execution time and live with the fact that we might have chosen a plan
that is not optimal, though still better than executing a
completely-unnecessary join.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Robert Haas (#6)

Re: Removing INNER JOINs

* Robert Haas (robertmhaas@gmail.com) wrote:

On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.

This optimization applies to a tremendous number of real-world cases,
and we really need to have it. This was a huge problem for me in my
previous life as a web developer. The previous work that we did to
remove LEFT JOINs was an enormous help, but it's not enough; we need a
way to remove INNER JOINs as well.

For my 2c, I'm completely with Robert on this one. There are a lot of
cases this could help with, particularly things coming out of ORMs
(which, yes, might possibly be better written, but that's a different
issue).

I thought that David's original approach of doing this in the planner
was a good one. That fell down because of the possibility that
apparently-valid referential integrity constraints might not be valid
at execution time if the triggers were deferred. But frankly, that
seems like an awfully nitpicky thing for this to fall down on. Lots
of web applications are going to issue only SELECT statements that run
as as single-statement transactions, and so that issue, so troubling
in theory, will never occur in practice. That doesn't mean that we
don't need to account for it somehow to make the code safe, but any
argument that it abridges the use case significantly is, in my
opinion, not credible.

Agreed with this also, deferred triggers are not common-place in my
experience and when it *does* happen, ime at least, it's because you
have a long-running data load or similar where you're not going to
care one bit that large, complicated JOINs aren't as fast as they
might have been otherwise.

Anyway, David was undeterred by the rejection of that initial approach
and rearranged everything, based on suggestions from Andres and later
Simon, into the form it's reached now. Kudos to him for his
persistance. But your point that we might have chosen a whole
different plan if it had known that this join was cheaper is a good
one. However, that takes us right back to square one, which is to do
this at plan time. I happen to think that's probably better anyway,
but I fear we're just going around in circles here. We can either do
it at plan time and find some way of handling the fact that there
might be deferred triggers that haven't fired yet; or we can do it at
execution time and live with the fact that we might have chosen a plan
that is not optimal, though still better than executing a
completely-unnecessary join.

Right, we can't get it wrong in the face of deferred triggers either.
Have we considered only doing the optimization for read-only
transactions? I'm not thrilled with that, but at least we'd get out
from under this deferred triggers concern. Another way might be an
option to say "use the optimization, but throw an error if you run
into a deferred trigger", or perhaps save both plans and use whichever
one we can when we get to execution time? That could make planning
time go up too much to work, but perhaps it's worth testing..

Thanks,

Stephen

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Robert Haas (#6)

Re: Removing INNER JOINs

On 3 December 2014 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.

This optimization applies to a tremendous number of real-world cases,
and we really need to have it. This was a huge problem for me in my
previous life as a web developer. The previous work that we did to
remove LEFT JOINs was an enormous help, but it's not enough; we need a
way to remove INNER JOINs as well.

I thought that David's original approach of doing this in the planner
was a good one. That fell down because of the possibility that
apparently-valid referential integrity constraints might not be valid
at execution time if the triggers were deferred. But frankly, that
seems like an awfully nitpicky thing for this to fall down on. Lots
of web applications are going to issue only SELECT statements that run
as as single-statement transactions, and so that issue, so troubling
in theory, will never occur in practice. That doesn't mean that we
don't need to account for it somehow to make the code safe, but any
argument that it abridges the use case significantly is, in my
opinion, not credible.

Anyway, David was undeterred by the rejection of that initial approach
and rearranged everything, based on suggestions from Andres and later
Simon, into the form it's reached now. Kudos to him for his
persistance. But your point that we might have chosen a whole
different plan if it had known that this join was cheaper is a good
one. However, that takes us right back to square one, which is to do
this at plan time. I happen to think that's probably better anyway,
but I fear we're just going around in circles here. We can either do
it at plan time and find some way of handling the fact that there
might be deferred triggers that haven't fired yet; or we can do it at
execution time and live with the fact that we might have chosen a plan
that is not optimal, though still better than executing a
completely-unnecessary join.

Just so that I don't end up going around in circles again, let me
summarise my understanding of the pros and cons of each of the states that
this patch has been in.

*** Method 1: Removing Inner Joins at planning time:

Pros:

1. Plan generated should be optimal, i.e should generate the same plan for
the query as if the removed relations were never included in the query's
text.
2. On successful join removal planning likely will be faster as there's
less paths to consider having fewer relations and join combinations.

Cons:
1. Assumptions must be made during planning about the trigger queue being
empty or not. During execution, if there are pending fk triggers which need
to be executed then we could produce wrong results.

*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)

Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.

Cons:
1. Planner may not generate optimal plan. e.g sort nodes may be useless for
Merge joins
2. Code needed to be added to all join methods to allow skipping, nested
loop joins suffered from a small overhead.
3. Small overhead from visiting extra nodes in the plan which would not be
present if those nodes had been removed.
4. Problems writing regression tests due to having to use EXPLAIN ANALYZE
to try to work out what's going on, and the output containing variable
runtime values.

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.
2. Does not require extra code in all join types (see cons #2 above)
3. Does not suffer from extra node visiting overhead (see cons #3 above)

Cons:
1. Executor must modify the plan.
2. Planner may have generated a plan which is not optimal for modification
by the executor (e.g. Sort nodes for merge join, or index scans for
pre-sorted input won't become seqscans which may be more efficient as
ordering may not be required after removing a merge join)

With each of the methods listed above, someone has had a problem with, and
from the feedback given I've made changes based and ended up with the next
revision of the patch.

Tom has now pointed out that he does not like the executor modifying the
plan, which I agree with to an extent as it I really do hate the extra
useless nodes that I'm unable to remove from the plan.

I'd like to propose Method 4 which I believe solves quite a few of the
problems seen in the other method.

Method 4: (Which is I think what Mart had in mind, I've only expanded on it
a bit with thoughts about possible implementations methods)

1. Invent planner flags which control the optimiser's ability to perform
join removals
2. Add a GUC for the default planner flags. (PLANFLAG_REMOVE_INNER_JOINS)
3. Join removal code checks if the appropriate planner flag is set before
performing join removal.
4. If join removals are performed, planner sets flags which were "utilised"
by the planner.
5. At Executor startup check plan's "utilised" flags and verifies the plan
is compatible for current executor status. e.g if
PLANFLAG_REMOVE_INNER_JOINS is set, then we'd better be sure there's no
pending foreign key triggers, if there are then the executor invokes the
planner with: planflags & ~(all_flags_which_are_not_compatible)
6. planner generates a plan without removing inner joins. (does not set
utilised flag)
7. goto step 5

If any users are suffering the overhead of this replanning then they can
Zero out the planner_flags GUC and get the standard behaviour back

This would also allow deferrable unique indexes to be used for LEFT JOIN
removals... We'd just need to tag
PLANFLAG_REMOVE_LEFT_JOIN_WITH_DEFERRED_UNIQUE_IDX (or something shorter),
onto the utilised flags and have the executor check that no unique indexes
are waiting to be updated.

Things I'm currently not sure about are:

a. can we invoke the planner during executor init?
b. PREPAREd statements... Which plan do we cache? It might not be very nice
to force the executor to re-plan if the generated plan was not compatible
with the current executor state. Or if we then replaced the cached plan,
then subsequent executions of the prepared statement could contain
redundant joins. Perhaps we can just stash both plans having planned them
lazily as and when required.

Pros:
1. Generates optimal plan
2. Could speed up planning when useless joins are removed.
3. Executor does not have to modify the plan.
4. No wrong results from removing joins when there's pending fk triggers.
5. No extra overhead from visiting useless plan nodes at execution time.

Cons:
1. Executor may have to invoke planner.
2. May have to plan queries twice.

I'm not seeing cons #2 as massively bad, as likely this won't happen too
often. This seems far better than generating an alternative plan which may
never be used, though, it all hangs on, is it even possible for the
executor to call planner() or standard_planner() ?

Regards

David Rowley

Simon Riggs

simon@2ndQuadrant.com

about 11 years ago

In reply to: David Rowley (#8)

Re: Removing INNER JOINs

On 3 December 2014 at 09:29, David Rowley <dgrowleyml@gmail.com> wrote:

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.
2. Does not require extra code in all join types (see cons #2 above)
3. Does not suffer from extra node visiting overhead (see cons #3 above)

Cons:
1. Executor must modify the plan.
2. Planner may have generated a plan which is not optimal for modification
by the executor (e.g. Sort nodes for merge join, or index scans for
pre-sorted input won't become seqscans which may be more efficient as
ordering may not be required after removing a merge join)

With each of the methods listed above, someone has had a problem with, and
from the feedback given I've made changes based and ended up with the next
revision of the patch.

Tom has now pointed out that he does not like the executor modifying the
plan, which I agree with to an extent as it I really do hate the extra
useless nodes that I'm unable to remove from the plan.

I guess we need an Option node. Tom and I discussed that about an aeon ago.

The Option node has a plan for each situation. At execution time, we
make the test specified in the plan and then select the appropriate
subplan.

That way we can see what is happening in the plan and the executor
doesn't need to edit anything.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Atri Sharma

atri.jiit@gmail.com

about 11 years ago

In reply to: Simon Riggs (#9)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 5:00 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 3 December 2014 at 09:29, David Rowley <dgrowleyml@gmail.com> wrote:

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

Pros:
1. The plan can be executed as normal if there are any foreign key

triggers

pending.
2. Does not require extra code in all join types (see cons #2 above)
3. Does not suffer from extra node visiting overhead (see cons #3 above)

Cons:
1. Executor must modify the plan.
2. Planner may have generated a plan which is not optimal for

modification

by the executor (e.g. Sort nodes for merge join, or index scans for
pre-sorted input won't become seqscans which may be more efficient as
ordering may not be required after removing a merge join)

With each of the methods listed above, someone has had a problem with,

and

from the feedback given I've made changes based and ended up with the

next

revision of the patch.

Tom has now pointed out that he does not like the executor modifying the
plan, which I agree with to an extent as it I really do hate the extra
useless nodes that I'm unable to remove from the plan.

I guess we need an Option node. Tom and I discussed that about an aeon ago.

The Option node has a plan for each situation. At execution time, we
make the test specified in the plan and then select the appropriate
subplan.

That way we can see what is happening in the plan and the executor
doesn't need to edit anything.

So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?

So is the idea essentially making the planner return a set of "best" plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?

IMO this sounds like punting the planner's task to executor. Not to mention
some overhead for maintaining various plans that might have been discarded
early in the planning and path cost evaluation phase (consider a path with
pathkeys specified, like with ORDINALITY. Can there be edge cases where we
might end up invalidating the entire path if we let executor modify it, or,
maybe just lose the ordinality optimization?)

I agree that executor should not modify plans, but letting executor choose
the plan to execute (out of a set from planner, of course) rather than
planner giving executor a single plan and executor not caring about the
semantics, seems a bit counterintuitive to me. It might be just me though.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

#11

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Atri Sharma (#10)

Re: Removing INNER JOINs

* Atri Sharma (atri.jiit@gmail.com) wrote:

So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?

Right, this was one of the thoughts that I had.

So is the idea essentially making the planner return a set of "best" plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?

We *already* have an idea of there being multiple plans (see
plancache.c).

IMO this sounds like punting the planner's task to executor. Not to mention
some overhead for maintaining various plans that might have been discarded
early in the planning and path cost evaluation phase (consider a path with
pathkeys specified, like with ORDINALITY. Can there be edge cases where we
might end up invalidating the entire path if we let executor modify it, or,
maybe just lose the ordinality optimization?)

The executor isn't modifying the plan, it's just picking one based on
what the current situation is (which is information that only the
executor can have, such as if there are pending deferred triggers).

I agree that executor should not modify plans, but letting executor choose
the plan to execute (out of a set from planner, of course) rather than
planner giving executor a single plan and executor not caring about the
semantics, seems a bit counterintuitive to me. It might be just me though.

I don't think it follows that the executor is now required to care about
semantics. The planner says "use plan A if X is true; use plan B is X
is not true" and then the executor does exactly that. There's nothing
about the plans provided by the planner which are being changed and
there is no re-planning going on (though, as I point out, we actually
*do* re-plan in cases where we think the new plan is much much better
than the prior plan..).

Thanks!

Stephen

#12

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Stephen Frost (#11)

Re: Removing INNER JOINs

* Stephen Frost (sfrost@snowman.net) wrote:

* Atri Sharma (atri.jiit@gmail.com) wrote:

So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?

Right, this was one of the thoughts that I had.

Erm, "I had also". Don't mean to imply that it was all my idea or
something silly like that.

Thanks,

Stephen

#13

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Simon Riggs (#9)

Re: Removing INNER JOINs

On 2014-12-03 11:30:32 +0000, Simon Riggs wrote:

I guess we need an Option node. Tom and I discussed that about an aeon ago.

The Option node has a plan for each situation. At execution time, we
make the test specified in the plan and then select the appropriate
subplan.

That way we can see what is happening in the plan and the executor
doesn't need to edit anything.

Given David's result where he noticed a performance impact due to the
additional branch in the join code - which I still have a bit of a hard
time to believe - it seems likely that a whole separate node that has to
pass stuff around will be more expensive.

I think the switch would actually have to be done in ExecInitNode() et
al. David, if you essentially take your previous solution and move the
if into ExecInitNode(), does it work well?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: David Rowley (#8)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:

*** Method 1: Removing Inner Joins at planning time:

*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

[....]

a. can we invoke the planner during executor init?

I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused. I am less
clear on the specifics of why the former is unsafe, but what I think
it boils down to is that the plan per se needs to be finalized before
we begin execution; any replanning needs to be handled in the
plancache code. I am not sure whether it's feasible to do something
about this at the plancache layer; we have an is_oneshot flag there,
so perhaps one-shot plans could simply test whether there are pending
triggers, and non-oneshot plans could forego the optimization until we
come up with something better.

If that doesn't work for some reason, then I think we basically have
to give up on the idea of replanning if the situation becomes unsafe
between planning and execution. That leaves us with two alternatives.
One is to create a plan incorporating the optimization and another not
incorporating the optimization and decide between them at runtime,
which sounds expensive. The second is to create a plan that
contemplates performing the join and skip the join if it turns out to
be possible, living with the fact that the resulting plan might be
less than optimal - in other words, option #2. I am not sure that's
all that bad. Planning is ALWAYS an exercise in predicting the
future: we use statistics gathered at some point in the past, which
are furthermore imprecise, to predict what will happen if we try to
execute a given plan at some point in the future. Sometimes we are
wrong, but that doesn't prevent us from trying to our best to predict
the outcome; so here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Atri Sharma

atri.jiit@gmail.com

about 11 years ago

In reply to: Stephen Frost (#11)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 8:32 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Atri Sharma (atri.jiit@gmail.com) wrote:

So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?

Right, this was one of the thoughts that I had.

So is the idea essentially making the planner return a set of "best"

plans,

one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?

We *already* have an idea of there being multiple plans (see
plancache.c).

Thanks for pointing me there.

What I am concerned about is that in this case, the option plans are
competing plans rather than separate plans.

My main concern is that we might be not able to discard plans that we know
that are not optimal early in planning. My understanding is that planner is
aggressive when discarding potential paths. Maintaining them ahead and
storing and returning them might have issues, but that is only my thought.

--
Regards,

Atri
*l'apprenant*

#16

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#14)

Re: Removing INNER JOINs

On 2014-12-03 10:51:19 -0500, Robert Haas wrote:

On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:

*** Method 1: Removing Inner Joins at planning time:

*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

[....]

a. can we invoke the planner during executor init?

I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused.

We already have a transformation between the plan and execution
tree. I'm right now not seing why transforming the trees in
ExecInitNode() et. al. would be unsafe - it looks fairly simple to
switch between different execution plans there.

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#16)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-12-03 10:51:19 -0500, Robert Haas wrote:

On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:

*** Method 1: Removing Inner Joins at planning time:

*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

[....]

a. can we invoke the planner during executor init?

I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused.

We already have a transformation between the plan and execution
tree.

We do?

I think what we have is a plan tree, which is potentially stored in a
plan cache someplace and thus must be read-only, and a planstate tree,
which contains the stuff that is for this specific execution. There's
probably some freedom to do exciting things in the planstate nodes,
but I don't think you can tinker with the plan itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Andres Freund

andres@2ndquadrant.com

about 11 years ago

In reply to: Robert Haas (#17)

Re: Removing INNER JOINs

On 2014-12-03 11:11:49 -0500, Robert Haas wrote:

On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-12-03 10:51:19 -0500, Robert Haas wrote:

On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:

*** Method 1: Removing Inner Joins at planning time:

*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)

*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)

[....]

a. can we invoke the planner during executor init?

I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused.

We already have a transformation between the plan and execution
tree.

We do?

I think what we have is a plan tree, which is potentially stored in a
plan cache someplace and thus must be read-only, and a planstate tree,
which contains the stuff that is for this specific execution. There's
probably some freedom to do exciting things in the planstate nodes,
but I don't think you can tinker with the plan itself.

Well, the planstate tree is what determines the execution, right? I
don't see what would stop us from doing something like replacing:
PlanState *
ExecInitNode(Plan *node, EState *estate, int eflags)
{
...
case T_NestLoop:
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);
by
case T_NestLoop:
if (JoinCanBeSkipped(node))
result = NonSkippedJoinNode(node);
else
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);

Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
from David's early patch where he put the logic entirely into the actual
execution phase.

We'd probably want to move the join nodes into a separate ExecInitJoin()
function and do the JoinCanBeSkipped() and NonSkippedJoin() node in the
generic code.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Atri Sharma (#15)

Re: Removing INNER JOINs

* Atri Sharma (atri.jiit@gmail.com) wrote:

What I am concerned about is that in this case, the option plans are
competing plans rather than separate plans.

Not sure I follow this thought entirely.. The plans in the plancache
are competeing, but separate, plans.

My main concern is that we might be not able to discard plans that we know
that are not optimal early in planning. My understanding is that planner is
aggressive when discarding potential paths. Maintaining them ahead and
storing and returning them might have issues, but that is only my thought.

The planner is aggressive at discarding potential paths, but this is all
a consideration for how expensive this particular optimization is, not
an issue with the approach itself. We certainly don't want an
optimization that doubles the time for 100% of queries planned but only
saves time in 5% of the cases, but if we can spend an extra 5% of the
time required for planning in the 1% of cases where the optimization
could possibly happen to save a huge amount of time for those queries,
then it's something to consider.

We would definitely want to spend as little time as possible checking
for this optimization in cases where it isn't possible to use the
optimization.

Thanks,

Stephen

#20

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Andres Freund (#18)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Well, the planstate tree is what determines the execution, right? I
don't see what would stop us from doing something like replacing:
PlanState *
ExecInitNode(Plan *node, EState *estate, int eflags)
{
...
case T_NestLoop:
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);
by
case T_NestLoop:
if (JoinCanBeSkipped(node))
result = NonSkippedJoinNode(node);
else
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);

Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
from David's early patch where he put the logic entirely into the actual
execution phase.

Yeah, maybe. I think there's sort of a coding principle that the plan
and planstate trees should match up one-to-one, but it's possible that
nothing breaks if they don't, or that I've misunderstood the coding
rule in the first instance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Robert Haas (#20)

Re: Removing INNER JOINs

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 3, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:

Well, the planstate tree is what determines the execution, right? I
don't see what would stop us from doing something like replacing:
PlanState *
ExecInitNode(Plan *node, EState *estate, int eflags)
{
...
case T_NestLoop:
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);
by
case T_NestLoop:
if (JoinCanBeSkipped(node))
result = NonSkippedJoinNode(node);
else
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);

Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
from David's early patch where he put the logic entirely into the actual
execution phase.

Yeah, maybe. I think there's sort of a coding principle that the plan
and planstate trees should match up one-to-one, but it's possible that
nothing breaks if they don't, or that I've misunderstood the coding
rule in the first instance.

Far better would be what I mentioned upthread: an explicit switch node
in the plan tree, analogous to the existing AlternativeSubPlan structure.

ChooseJoinSubPlan
-> plan tree requiring all tables to be joined
-> plan tree not requiring all tables to be joined

This allows sensible display by EXPLAIN and avoids the need for the core
executor code to be dirtied with implementation of the precise switch
rule: all that logic goes into the ChooseJoinSubPlan plan node code.

I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal. At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input. I'm not sure how much cleanup work would
be needed before that "re-plan" operation could happen as easily as is
suggested above. But in principle this could be made to work.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#21)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal. At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input. I'm not sure how much cleanup work would
be needed before that "re-plan" operation could happen as easily as is
suggested above. But in principle this could be made to work.

Doesn't this double the planning overhead, in most cases for no
benefit? The alternative plan used only when there are deferred
triggers is rarely going to get used.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Robert Haas (#22)

Re: Removing INNER JOINs

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal. At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input. I'm not sure how much cleanup work would
be needed before that "re-plan" operation could happen as easily as is
suggested above. But in principle this could be made to work.

Doesn't this double the planning overhead, in most cases for no
benefit? The alternative plan used only when there are deferred
triggers is rarely going to get used.

Personally, I remain of the opinion that this optimization will apply in
only a tiny fraction of real-world cases, so I'm mostly concerned about
not blowing out planning time when the optimization doesn't apply.
However, even granting that that is a concern, so what? You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Tom Lane (#23)

Re: Removing INNER JOINs

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal. At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input. I'm not sure how much cleanup work would
be needed before that "re-plan" operation could happen as easily as is
suggested above. But in principle this could be made to work.

Doesn't this double the planning overhead, in most cases for no
benefit? The alternative plan used only when there are deferred
triggers is rarely going to get used.

Personally, I remain of the opinion that this optimization will apply in
only a tiny fraction of real-world cases, so I'm mostly concerned about
not blowing out planning time when the optimization doesn't apply.

This was my thought also- most of the time we won't be able to apply the
optimization and we'll know that pretty early on and can skip the double
planning. What makes this worthwhile is that there are cases where
it'll be applied regularly due to certain tools/technologies being used
and the extra planning will be more than made up for by the reduction in
execution time.

However, even granting that that is a concern, so what? You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.

Yeah, I don't see a way around that..

Thanks,

Stephen

#25

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Stephen Frost (#24)

Re: Removing INNER JOINs

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

However, even granting that that is a concern, so what? You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.

Yeah, I don't see a way around that..

Also, it occurs to me that it's only necessary to repeat the join search
part of the process, which means that in principle the mechanisms already
exist for that; see GEQO. This means that for small join problems, the
total planning time would much less than double anyway. For large
problems, where the join search is the bulk of the time, we could hope
that removal of unnecessary joins would reduce the join search runtime
enough that the second search would be pretty negligible next to the
first (which is not optional). So I think "it'll double the runtime"
is an unfounded objection, or at least there's good reason to hope it's
unfounded.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Atri Sharma

atri.jiit@gmail.com

about 11 years ago

In reply to: Tom Lane (#25)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 11:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

However, even granting that that is a concern, so what? You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.

Yeah, I don't see a way around that..

Also, it occurs to me that it's only necessary to repeat the join search
part of the process, which means that in principle the mechanisms already
exist for that; see GEQO. This means that for small join problems, the
total planning time would much less than double anyway. For large
problems, where the join search is the bulk of the time, we could hope
that removal of unnecessary joins would reduce the join search runtime
enough that the second search would be pretty negligible next to the
first (which is not optional). So I think "it'll double the runtime"
is an unfounded objection, or at least there's good reason to hope it's
unfounded.

Is it possible to only replan part of the plan in case of this
optimization? I think that we might need to only replan parts of the
original plan (as you mentioned, join search and above). So we could reuse
the original plan in part and not do a lot of replanning (an obvious case
is scan strategy, which we can assume will not change for the two plans).

I wonder if we could have a rule based system for replacement of some plan
nodes with other type of nodes. As we discover more cases like this, we
could add more rules. Wild thought though.

--
Regards,

Atri
*l'apprenant*

#27

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: Tom Lane (#25)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

However, even granting that that is a concern, so what? You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.

Yeah, I don't see a way around that..

Also, it occurs to me that it's only necessary to repeat the join search
part of the process, which means that in principle the mechanisms already
exist for that; see GEQO. This means that for small join problems, the
total planning time would much less than double anyway. For large
problems, where the join search is the bulk of the time, we could hope
that removal of unnecessary joins would reduce the join search runtime
enough that the second search would be pretty negligible next to the
first (which is not optional). So I think "it'll double the runtime"
is an unfounded objection, or at least there's good reason to hope it's
unfounded.

OK. One other point of hope is that, in my experience, the queries
where you need join removal are the ones where there are lots of
tables being joined and there are often quite a few of those joins
that can be removed, not just one. So the extra planner overhead
might pay off anyway.

(It still seems a shame to have to plan for the not-removing-the-joins
case since it will so rarely happen. But maybe I should take what I
can get.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Atri Sharma (#26)

Re: Removing INNER JOINs

Atri Sharma <atri.jiit@gmail.com> writes:

Is it possible to only replan part of the plan in case of this
optimization? I think that we might need to only replan parts of the
original plan (as you mentioned, join search and above). So we could reuse
the original plan in part and not do a lot of replanning (an obvious case
is scan strategy, which we can assume will not change for the two plans).

I think you assume wrong; or at least, I certainly would not wish to
hard-wire any such assumption. Skipping some joins could change the
shape of the join tree *completely*, because the cost estimates will
change so much. And that could in turn lead to making different choices
of scan methods, eg, we might or might not care about sort order of
a scan result if we change join methods.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Atri Sharma

atri.jiit@gmail.com

about 11 years ago

In reply to: Tom Lane (#28)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 11:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Atri Sharma <atri.jiit@gmail.com> writes:

Is it possible to only replan part of the plan in case of this
optimization? I think that we might need to only replan parts of the
original plan (as you mentioned, join search and above). So we could

reuse

the original plan in part and not do a lot of replanning (an obvious case
is scan strategy, which we can assume will not change for the two plans).

I think you assume wrong; or at least, I certainly would not wish to
hard-wire any such assumption. Skipping some joins could change the
shape of the join tree *completely*, because the cost estimates will
change so much. And that could in turn lead to making different choices
of scan methods, eg, we might or might not care about sort order of
a scan result if we change join methods.

regards, tom lane

Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in scan,
FunctionScan).

#30

Stephen Frost

sfrost@snowman.net

about 11 years ago

In reply to: Atri Sharma (#29)

Re: Removing INNER JOINs

* Atri Sharma (atri.jiit@gmail.com) wrote:

Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in scan,
FunctionScan).

All neat ideas but how about we get something which works in the way
being asked for before we start trying to optimize it..? Maybe I'm
missing something, but getting all of this infrastructure into place and
making sure things aren't done to the plan tree which shouldn't be (or
done to all of them if necessary..) is enough that we should get that
bit done first and then worry if there are ways we can further improve
things..

THanks,

Stephen

#31

Atri Sharma

atri.jiit@gmail.com

about 11 years ago

In reply to: Stephen Frost (#30)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 11:27 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Atri Sharma (atri.jiit@gmail.com) wrote:

Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in scan,
FunctionScan).

All neat ideas but how about we get something which works in the way
being asked for before we start trying to optimize it..? Maybe I'm
missing something, but getting all of this infrastructure into place and
making sure things aren't done to the plan tree which shouldn't be (or
done to all of them if necessary..) is enough that we should get that
bit done first and then worry if there are ways we can further improve
things..

Right,sorry for digressing.

I think we are in agreement as to what needs to be done (start with a plan,
note ideas and replan if necessary). The idea of executor modifying the
plan (or personally, even choosing the plan) seems counterintuitive.

Does it also make sense to recalculate the costs from scratch for the
replan? It might be, I am just asking.

Regards,

Atri

#32

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Stephen Frost (#30)

Re: Removing INNER JOINs

Stephen Frost <sfrost@snowman.net> writes:

* Atri Sharma (atri.jiit@gmail.com) wrote:

Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in scan,
FunctionScan).

All neat ideas but how about we get something which works in the way
being asked for before we start trying to optimize it..? Maybe I'm
missing something, but getting all of this infrastructure into place and
making sure things aren't done to the plan tree which shouldn't be (or
done to all of them if necessary..) is enough that we should get that
bit done first and then worry if there are ways we can further improve
things..

Yeah; moreover, there's no evidence that hard-wiring such assumptions
would save anything. In the example of a FunctionScan, guess what:
there's only one Path for that relation anyway.

I think the right approach for now is to emulate the GEQO precedent as
closely as possible. Build all the single-relation Paths the same as
now, then do a join search over all the relations, then (if we've noticed
that some joins are potentially removable) do another join search over
just the nonremovable relations.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Atri Sharma (#31)

Re: Removing INNER JOINs

Atri Sharma <atri.jiit@gmail.com> writes:

Does it also make sense to recalculate the costs from scratch for the
replan? It might be, I am just asking.

The join costs would be recalculated from scratch, yes. The
single-relation Paths would already exist and their costs would not
change. Again, if you've not studied how GEQO works, you probably
should go do that before thinking more about how this would work.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Atri Sharma

atri.jiit@gmail.com

about 11 years ago

In reply to: Tom Lane (#32)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Atri Sharma (atri.jiit@gmail.com) wrote:

Agreed, but in some cases, we could possibly make some assumptions (if
there is no index, if a large fraction of table will be returned in

scan,

FunctionScan).

All neat ideas but how about we get something which works in the way
being asked for before we start trying to optimize it..? Maybe I'm
missing something, but getting all of this infrastructure into place and
making sure things aren't done to the plan tree which shouldn't be (or
done to all of them if necessary..) is enough that we should get that
bit done first and then worry if there are ways we can further improve
things..

Yeah; moreover, there's no evidence that hard-wiring such assumptions
would save anything. In the example of a FunctionScan, guess what:
there's only one Path for that relation anyway.

That is precisely what I meant :) I guess I was being too over cautious

and even trying to save the time spent in evaluating whatever paths we have
and building new FunctionScan paths...

I think the right approach for now is to emulate the GEQO precedent as
closely as possible. Build all the single-relation Paths the same as
now, then do a join search over all the relations, then (if we've noticed
that some joins are potentially removable) do another join search over
just the nonremovable relations.

How about using geqo more liberally when replanning (decrease the number of
relations in join before geqo is hit?)

--
Regards,

Atri
*l'apprenant*

#35

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Atri Sharma (#34)

Re: Removing INNER JOINs

Atri Sharma <atri.jiit@gmail.com> writes:

On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think the right approach for now is to emulate the GEQO precedent as
closely as possible. Build all the single-relation Paths the same as
now, then do a join search over all the relations, then (if we've noticed
that some joins are potentially removable) do another join search over
just the nonremovable relations.

How about using geqo more liberally when replanning (decrease the number of
relations in join before geqo is hit?)

This is going to be quite difficult enough without overcomplicating it.
Or as a wise man once said, "premature optimization is the root of all
evil". Get it working in the basic way and then see if improvement is
necessary at all.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Atri Sharma

atri.jiit@gmail.com

about 11 years ago

In reply to: Tom Lane (#35)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 11:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Atri Sharma <atri.jiit@gmail.com> writes:

On Wed, Dec 3, 2014 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think the right approach for now is to emulate the GEQO precedent as
closely as possible. Build all the single-relation Paths the same as
now, then do a join search over all the relations, then (if we've

noticed

that some joins are potentially removable) do another join search over
just the nonremovable relations.

How about using geqo more liberally when replanning (decrease the number

of

relations in join before geqo is hit?)

This is going to be quite difficult enough without overcomplicating it.
Or as a wise man once said, "premature optimization is the root of all
evil". Get it working in the basic way and then see if improvement is
necessary at all.

Sure, I can take a crack at it since I am working on a patch that does
require this alternative path approach. Let me try something and report my
experimental results.

#37

Heikki Linnakangas

hlinnakangas@vmware.com

about 11 years ago

In reply to: Robert Haas (#27)

Re: Removing INNER JOINs

On 12/03/2014 07:41 PM, Robert Haas wrote:

On Wed, Dec 3, 2014 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

However, even granting that that is a concern, so what? You *have* to
do the planning twice, or you're going to be generating a crap plan for
one case or the other.

Yeah, I don't see a way around that..

Also, it occurs to me that it's only necessary to repeat the join search
part of the process, which means that in principle the mechanisms already
exist for that; see GEQO. This means that for small join problems, the
total planning time would much less than double anyway. For large
problems, where the join search is the bulk of the time, we could hope
that removal of unnecessary joins would reduce the join search runtime
enough that the second search would be pretty negligible next to the
first (which is not optional). So I think "it'll double the runtime"
is an unfounded objection, or at least there's good reason to hope it's
unfounded.

OK. One other point of hope is that, in my experience, the queries
where you need join removal are the ones where there are lots of
tables being joined and there are often quite a few of those joins
that can be removed, not just one. So the extra planner overhead
might pay off anyway.

Do you need to plan for every combination, where some joins are removed
and some are not?

I hope the same mechanism could be used to prepare a plan for a query
with parameters, where the parameters might or might not allow a partial
index to be used. We have some smarts nowadays to use custom plans, but
this could be better.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Heikki Linnakangas (#37)

Re: Removing INNER JOINs

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

Do you need to plan for every combination, where some joins are removed
and some are not?

I would vote for just having two plans and one switch node. To exploit
any finer grain, we'd have to have infrastructure that would let us figure
out *which* constraints pending triggers might indicate transient
invalidity of, and that doesn't seem likely to be worth the trouble.

I hope the same mechanism could be used to prepare a plan for a query
with parameters, where the parameters might or might not allow a partial
index to be used. We have some smarts nowadays to use custom plans, but
this could be better.

Interesting thought, but that would be a totally different switch
condition ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

ktm@rice.edu

about 11 years ago

In reply to: Tom Lane (#38)

Re: Removing INNER JOINs

On Wed, Dec 03, 2014 at 02:08:27PM -0500, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

Do you need to plan for every combination, where some joins are removed
and some are not?

I would vote for just having two plans and one switch node. To exploit
any finer grain, we'd have to have infrastructure that would let us figure
out *which* constraints pending triggers might indicate transient
invalidity of, and that doesn't seem likely to be worth the trouble.

I hope the same mechanism could be used to prepare a plan for a query
with parameters, where the parameters might or might not allow a partial
index to be used. We have some smarts nowadays to use custom plans, but
this could be better.

Interesting thought, but that would be a totally different switch
condition ...

regards, tom lane

Or between a node with a low rows count and a high rows count for those
pesky mis-estimation queries.

Regards,
Ken

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Claudio Freire

klaussfreire@gmail.com

about 11 years ago

In reply to: Robert Haas (#22)

Re: Removing INNER JOINs

On Wed, Dec 3, 2014 at 2:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Dec 3, 2014 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal. At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input. I'm not sure how much cleanup work would
be needed before that "re-plan" operation could happen as easily as is
suggested above. But in principle this could be made to work.

Doesn't this double the planning overhead, in most cases for no
benefit? The alternative plan used only when there are deferred
triggers is rarely going to get used.

It shouldn't. It will only double (at worst) planning overhead for the
queries that do have removable joins, which would be the ones
benefiting from the extra work.

Whether that extra work pays off is the question to ask here. Perhaps
whether or not to remove the joins could be a decision made accounting
for overall plan cost and fraction of joins removed, as to avoid the
extra planning if execution will be fast anyway.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Simon Riggs

simon@2ndQuadrant.com

about 11 years ago

In reply to: Atri Sharma (#10)

Re: Removing INNER JOINs

On 3 December 2014 at 12:18, Atri Sharma <atri.jiit@gmail.com> wrote:

So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg) and
then lets executor choose between them?

I'm suggesting the planner keeps 2 plans: One with removable joins,
one without the removable joins.

You could, in theory, keep track of which tables had pending after
triggers and skip pruning of just those but that would require you to
grovel around in the after trigger queue, as well as keep a rather
large number of plans. I don't think this deserves that complexity,
since ISTM very likely we'll almost never need the full plans, so we
just make one short test to see if there is anything in the trigger
queue at all and if so skip the pruning of any joins.

The Executor already has a Result node which allows it to decide what
subnodes to execute at run time, so this is similar.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Simon Riggs

simon@2ndQuadrant.com

about 11 years ago

In reply to: Simon Riggs (#41)

Re: Removing INNER JOINs

On 4 December 2014 at 12:24, Simon Riggs <simon@2ndquadrant.com> wrote:

On 3 December 2014 at 12:18, Atri Sharma <atri.jiit@gmail.com> wrote:

So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg) and
then lets executor choose between them?

I'm suggesting the planner keeps 2 plans: One with removable joins,
one without the removable joins.

I only just noticed the thread moved on while I was flying.

So it looks Tom and I said the same thing, or close enough for me to +1 Tom.

Another idea would be to only skip Hash and Merge Joins, since the
tests for those are fairly easy to put into the Init call. That sounds
slightly easier than the proposal with the Option/Choice/Switch node.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Tom Lane (#21)

1 attachment(s)

Re: Removing INNER JOINs

On 4 December 2014 at 06:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Far better would be what I mentioned upthread: an explicit switch node
in the plan tree, analogous to the existing AlternativeSubPlan structure.

ChooseJoinSubPlan
-> plan tree requiring all tables to be joined
-> plan tree not requiring all tables to be joined

This allows sensible display by EXPLAIN and avoids the need for the core
executor code to be dirtied with implementation of the precise switch
rule: all that logic goes into the ChooseJoinSubPlan plan node code.

I'm not sure if I 100% understand what you mean by sensible EXPLAIN output?

Would you have both complete plans under a ChooseJoinSubPlan? If so I don't
yet understand why that is needed. Plain old EXPLAIN (without analyze)
performs an executor init, which I guess must have obtained a snapshot
somewhere along the line. Couldn't the plan just be selected at init time?
and just have the ChooseJoinSubPlan, or whatever select the correct plan?

I've attached a version which does this, and I think the EXPLAIN is quite
sensible... It shows the join removed plan when joins are removed, and
shows the "All Purpose" plan when there's FK triggers pending.

The patch has a few fixme's in it, but I think it's enough to demonstrate
how I think it could work.

The bulk of my changes are in allpaths.c, planmain.c and planner.c. The
critical change is query_planner() now returns a List instead of
a RelOptInfo. I wasn't quite sure how else to handle this. Please also
notice the change to make_one_rel(). This function is now called twice if
remove_useless_joins() found 1 or more INNER JOINs to be possibly
removable. in remove_useless_joins() the rels are not marked as
RELOPT_DEADREL like they are with LEFT JOINs, they remain as
RELOPT_BASEREL, only they have the skipFlags to mark that they can be
removed when there's no FK triggers pending. A flag on PlannerGlobal is
also set which will later force make_one_rel() to be called twice. Once for
the join removal plan, and once for the "All Purpose" plan. query_planner()
then returns a list of the RelOptInfos of those 2 final rels created by
make_one_rel(). All the processing that previously got done on that final
rel now gets done on the list of final rels. If there's more than 1 in that
list then I'm making the root node of the plan an "AlternativePlan" node.
On init of this node during execution time there is some logic which
chooses which plan to execute. The code here just calls ExecInitNode() on
the root node of the selected plan and returns that, thus skipping over the
AlternativePlan node, so that it can't be seen by EXPLAIN or EXPLAIN
ANALYZE.

The patch has grown quite a bit, but most of the extra size was generated
from having to indent a whole bunch code in grouping_planner() by 1 more
tab.

All regression tests pass and we're back to getting the most efficient
plans again. i.e no extra redundant sort node on merge joins! :-)

I'm keen to know if the attached patch is a viable option.

Oh, and if anyone is wondering why I made skipFlags a bit mask and not just
a bool. I think it would also be possible to expand LEFT JOIN removal at
some later date to allow deferrable unique constraints to remove LEFT
JOINs. This is currently not possible due to the planner not knowing if
there will be unique violations when the plan is executed. Quite possibly
this could be handled in a similar way to how the INNER JOINs work in the
attached.

Also, apologies for the late response on this. I've been busy the most
evenings this week so far. I was quite happy when I woke up in the morning
to see all the responses about this. So thank you everyone for generating
some great ideas. Hopefully I'll get this one nailed down soon.

Regards

David Rowley

Show quoted text

I would envision the planner starting out generating the first subplan
(without the optimization), but as it goes along, noting whether there
are any opportunities for join removal. At the end, if it found that
there were such opportunities, re-plan assuming that removal is possible.
Then stick a switch node on top.

This would give optimal plans for both cases, and it would avoid the need
for lots of extra planner cycles when the optimization can't be applied
... except for one small detail, which is that the planner has a bad habit
of scribbling on its own input. I'm not sure how much cleanup work would
be needed before that "re-plan" operation could happen as easily as is
suggested above. But in principle this could be made to work.

regards, tom lane

Attachments:

inner_join_removals_2014-12-10_7dc6756.patchapplication/octet-stream; name=inner_join_removals_2014-12-10_7dc6756.patchDownload

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index ebccfea..ea26615 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3889,6 +3889,17 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 	return all_fired;
 }
 
+/* ----------
+ * AfterTriggerQueueIsEmpty()
+ *
+ *	True if there are no pending triggers in the queue.
+ * ----------
+ */
+bool
+AfterTriggerQueueIsEmpty(void)
+{
+	return (afterTriggers.query_depth == -1 && afterTriggers.events.head == NULL);
+}
 
 /* ----------
  * AfterTriggerBeginXact()
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index e27c062..59dd3f3 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -77,6 +77,7 @@
  */
 #include "postgres.h"
 
+#include "commands/trigger.h"
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
 #include "executor/nodeAppend.h"
@@ -115,6 +116,24 @@
 #include "miscadmin.h"
 
 
+/* FIXME: this needs moved into the correct place */
+PlanState *
+ExecInitAlternativePlan(AlternativePlan *node, EState *estate, int eflags)
+{
+	/*
+	 * FIXME: this is not very robust, it would be better to store some flags
+	 * somewhere to indicate which plan is suitable for which purpose. For now
+	 * this is ok as we only generate a maximum of 2 plans, and the all purpose
+	 * plan is always the 2nd on in the list. Here we simply just initialze the
+	 * correct plan and return the plan state from the root node of that plan,
+	 * this completely eliminates the alternative plan node from the plan and
+	 * it should be never seen by EXPLAIN or EXPLAIN ANALYZE.
+	 */
+	if (!AfterTriggerQueueIsEmpty())
+		return (PlanState *) ExecInitNode((Plan *) list_nth(node->planList, 1), estate, eflags);
+	else
+		return (PlanState *) ExecInitNode((Plan *) linitial(node->planList), estate, eflags);
+}
 /* ------------------------------------------------------------------------
  *		ExecInitNode
  *
@@ -147,6 +166,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 			/*
 			 * control nodes
 			 */
+		case T_AlternativePlan:
+			result = (PlanState *) ExecInitAlternativePlan((AlternativePlan *)node,
+												  estate, eflags);
+			break;
+
 		case T_Result:
 			result = (PlanState *) ExecInitResult((Result *) node,
 												  estate, eflags);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 6b1bf7b..8bc9bf2 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -93,6 +93,7 @@ _copyPlannedStmt(const PlannedStmt *from)
 	COPY_NODE_FIELD(relationOids);
 	COPY_NODE_FIELD(invalItems);
 	COPY_SCALAR_FIELD(nParamExec);
+	COPY_SCALAR_FIELD(suitableFor);
 
 	return newnode;
 }
@@ -963,6 +964,16 @@ _copyLimit(const Limit *from)
 	return newnode;
 }
 
+static AlternativePlan *
+_copyAlternativePlan(const AlternativePlan *from)
+{
+	AlternativePlan *newnode = makeNode(AlternativePlan);
+
+	COPY_NODE_FIELD(planList);
+
+	return newnode;
+}
+
 /*
  * _copyNestLoopParam
  */
@@ -4118,6 +4129,9 @@ copyObject(const void *from)
 		case T_Limit:
 			retval = _copyLimit(from);
 			break;
+		case T_AlternativePlan:
+			retval = _copyAlternativePlan(from);
+			break;
 		case T_NestLoopParam:
 			retval = _copyNestLoopParam(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index edbd09f..b3af253 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -255,6 +255,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_FIELD(relationOids);
 	WRITE_NODE_FIELD(invalItems);
 	WRITE_INT_FIELD(nParamExec);
+	WRITE_INT_FIELD(suitableFor);
 }
 
 /*
@@ -1716,6 +1717,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
 	WRITE_UINT_FIELD(lastPHId);
 	WRITE_UINT_FIELD(lastRowMarkId);
 	WRITE_BOOL_FIELD(transientPlan);
+	WRITE_INT_FIELD(suitableFor);
 }
 
 static void
@@ -1801,6 +1803,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	/* we don't try to print fdwroutine or fdw_private */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_NODE_FIELD(joininfo);
+	WRITE_INT_FIELD(skipFlags);
 	WRITE_BOOL_FIELD(has_eclass_joins);
 }
 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 449fdc3..3b6d51d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,7 +97,8 @@ static void set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				 RangeTblEntry *rte);
 static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					   RangeTblEntry *rte);
-static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
+static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist,
+					   int skipflags);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
 						  pushdown_safety_info *safetyInfo);
 static bool recurse_pushdown_safe(Node *setOp, Query *topquery,
@@ -122,7 +123,7 @@ static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
  *	  single rel that represents the join of all base rels in the query.
  */
 RelOptInfo *
-make_one_rel(PlannerInfo *root, List *joinlist)
+make_one_rel(PlannerInfo *root, List *joinlist, int skipflags)
 {
 	RelOptInfo *rel;
 	Index		rti;
@@ -142,7 +143,8 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 		Assert(brel->relid == rti);		/* sanity check on array */
 
 		/* ignore RTEs that are "other rels" */
-		if (brel->reloptkind != RELOPT_BASEREL)
+		if (brel->reloptkind != RELOPT_BASEREL ||
+			brel->skipFlags & skipflags)
 			continue;
 
 		root->all_baserels = bms_add_member(root->all_baserels, brel->relid);
@@ -157,12 +159,13 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	/*
 	 * Generate access paths for the entire join tree.
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	rel = make_rel_from_joinlist(root, joinlist, skipflags);
+
 
 	/*
 	 * The result should join all and only the query's base rels.
 	 */
-	Assert(bms_equal(rel->relids, root->all_baserels));
+	Assert(bms_is_subset(root->all_baserels, rel->relids));
 
 	return rel;
 }
@@ -1496,7 +1499,7 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * data structure.
  */
 static RelOptInfo *
-make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
+make_rel_from_joinlist(PlannerInfo *root, List *joinlist, int skipflags)
 {
 	int			levels_needed;
 	List	   *initial_rels;
@@ -1528,11 +1531,22 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 			int			varno = ((RangeTblRef *) jlnode)->rtindex;
 
 			thisrel = find_base_rel(root, varno);
+
+			/*
+			 * if this relation can be skipped for these skipflags, then we'll
+			 * not bother including this in the list of relations to join to
+			 */
+			if ((thisrel->skipFlags & skipflags))
+			{
+				/* one less level needed too */
+				levels_needed--;
+				continue;
+			}
 		}
 		else if (IsA(jlnode, List))
 		{
 			/* Recurse to handle subproblem */
-			thisrel = make_rel_from_joinlist(root, (List *) jlnode);
+			thisrel = make_rel_from_joinlist(root, (List *) jlnode, skipflags);
 		}
 		else
 		{
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 9919d27..33f8a90 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -49,8 +49,6 @@ static List *generate_join_implied_equalities_broken(PlannerInfo *root,
 										Relids outer_relids,
 										Relids nominal_inner_relids,
 										RelOptInfo *inner_rel);
-static Oid select_equality_operator(EquivalenceClass *ec,
-						 Oid lefttype, Oid righttype);
 static RestrictInfo *create_join_clause(PlannerInfo *root,
 				   EquivalenceClass *ec, Oid opno,
 				   EquivalenceMember *leftem,
@@ -1282,7 +1280,7 @@ generate_join_implied_equalities_broken(PlannerInfo *root,
  *
  * Returns InvalidOid if no operator can be found for this datatype combination
  */
-static Oid
+Oid
 select_equality_operator(EquivalenceClass *ec, Oid lefttype, Oid righttype)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index e99d416..df0d42f 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -32,13 +32,21 @@
 #include "utils/lsyscache.h"
 
 /* local functions */
-static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					  RangeTblRef *removalrtr, Relids ignoredrels);
+static bool leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool relation_is_needed(PlannerInfo *root, Relids joinrelids,
+					  RelOptInfo *rel, Relids ignoredrels);
+static bool relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+					  RelOptInfo *referencedrel, List *referencing_vars,
+					  List *index_vars, List *operator_list);
+static bool expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					  List *indexvars, List *operators);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 					  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
 
-
 /*
  * remove_useless_joins
  *		Check for relations that don't actually need to be joined at all,
@@ -46,26 +54,101 @@ static Oid	distinct_col_search(int colno, List *colnos, List *opids);
  *
  * We are passed the current joinlist and return the updated list.  Other
  * data structures that have to be updated are accessible via "root".
+ *
+ * There are 2 methods here for removing joins. Joins such as LEFT JOINs
+ * which can be proved to be needless due to lack of use of any of the joining
+ * relation's columns and the existence of a unique index on a subset of the
+ * join clause, can simply be removed from the query plan at plan time. For
+ * certain other join types we make use of foreign keys to attempt to prove the
+ * join is needless, though, for these we're unable to be certain that the join
+ * is not required at plan time, as if the plan is executed when pending
+ * foreign key triggers have not yet been fired, then the foreign key is
+ * effectively violated until these triggers have fired. Removing a join in
+ * such a case could cause a query to produce incorrect results.
+ *
+ * Instead we handle this case by marking the RangeTblEntry for the relation
+ * with a special flag which tells the executor that it's possible that joining
+ * to this relation may not be required. The executor may then check this flag
+ * and choose to skip the join based on if there are foreign key triggers
+ * pending or not.
  */
 List *
 remove_useless_joins(PlannerInfo *root, List *joinlist)
 {
 	ListCell   *lc;
+	Relids		removedrels = NULL;
 
 	/*
-	 * We are only interested in relations that are left-joined to, so we can
-	 * scan the join_info_list to find them easily.
+	 * Start by analyzing INNER JOINed relations in order to determine if any
+	 * of the relations can be ignored.
 	 */
 restart:
+	foreach(lc, joinlist)
+	{
+		RangeTblRef		*rtr = (RangeTblRef *) lfirst(lc);
+		RelOptInfo		*rel;
+
+		if (!IsA(rtr, RangeTblRef))
+			continue;
+
+		rel = root->simple_rel_array[rtr->rtindex];
+
+		/* Don't try to remove this one again if we've already removed it */
+		if ((rel->skipFlags & PLAN_SUITABILITY_FK_TRIGGER_EMPTY) != 0)
+			continue;
+
+		/* skip if the join can't be removed */
+		if (!innerjoin_is_removable(root, joinlist, rtr, removedrels))
+			continue;
+
+		/*
+		 * Since we're not actually removing the join here, we need to maintain
+		 * a list of relations that we've "removed" so when we're checking if
+		 * other relations can be removed we'll know that if the to be removed
+		 * relation is only referenced by a relation that we've already removed
+		 * that it can be safely assumed that the relation is not referenced by
+		 * any useful relation.
+		 */
+		removedrels = bms_add_member(removedrels, rtr->rtindex);
+
+		/*
+		 * Mark that this relation is only required when the foreign key trigger
+		 * queue us non-empty.
+		 */
+		rel->skipFlags |= PLAN_SUITABILITY_FK_TRIGGER_EMPTY;
+
+		/*
+		 * Globally mark the plan to say it has skippable nodes. This lets
+		 * the executor know if it should bother trying to pull the joins
+		 * out of the plan tree
+		 */
+		root->glob->suitableFor |= PLAN_SUITABILITY_FK_TRIGGER_EMPTY;
+
+		/*
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that since we've added
+		 * this relation to the removedrels, we may now realize that other
+		 * relations can also be removed as they're only referenced by the one
+		 * that we've just marked as possibly removable).
+		 */
+		goto restart;
+	}
+
+	/* now process special joins. Currently only left joins are supported */
 	foreach(lc, root->join_info_list)
 	{
 		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
 		int			innerrelid;
 		int			nremoved;
 
-		/* Skip if not removable */
-		if (!join_is_removable(root, sjinfo))
-			continue;
+		if (sjinfo->jointype == JOIN_LEFT)
+		{
+			/* Skip if not removable */
+			if (!leftjoin_is_removable(root, sjinfo))
+				continue;
+		}
+		else
+			continue; /* we don't support this join type */
 
 		/*
 		 * Currently, join_is_removable can only succeed when the sjinfo's
@@ -91,12 +174,11 @@ restart:
 		root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
 
 		/*
-		 * Restart the scan.  This is necessary to ensure we find all
-		 * removable joins independently of ordering of the join_info_list
-		 * (note that removal of attr_needed bits may make a join appear
-		 * removable that did not before).  Also, since we just deleted the
-		 * current list cell, we'd have to have some kluge to continue the
-		 * list scan anyway.
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that removal of
+		 * attr_needed bits may make a join, inner or outer, appear removable
+		 * that did not before).   Also, since we just deleted the current list
+		 * cell, we'd have to have some kluge to continue the list scan anyway.
 		 */
 		goto restart;
 	}
@@ -136,8 +218,213 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
 }
 
 /*
- * join_is_removable
- *	  Check whether we need not perform this special join at all, because
+ * innerjoin_is_removable
+ *		True if the join to removalrtr can be removed.
+ *
+ * In order to prove a relation which is inner joined is not required we must
+ * be sure that the join would emit exactly 1 row on the join condition. This
+ * differs from the logic which is used for proving LEFT JOINs can be removed,
+ * where it's possible to just check that a unique index exists on the relation
+ * being removed which has a set of columns that is a subset of the columns
+ * seen in the join condition. If no matching row is found then left join would
+ * not remove the non-matched row from the result set. This is not the case
+ * with INNER JOINs, so here we must use foreign keys as proof that the 1 row
+ * exists before we can allow any joins to be removed.
+ */
+static bool
+innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					   RangeTblRef *removalrtr, Relids ignoredrels)
+{
+	ListCell   *lc;
+	RelOptInfo *removalrel;
+
+	removalrel = find_base_rel(root, removalrtr->rtindex);
+
+	/*
+	 * As foreign keys may only reference base rels which have unique indexes,
+	 * we needn't go any further if we're not dealing with a base rel, or if
+	 * the base rel has no unique indexes. We'd also better abort if the
+	 * rtekind is anything but a relation, as things like sub-queries may have
+	 * grouping or distinct clauses that would cause us not to be able to use
+	 * the foreign key to prove the existence of a row matching the join
+	 * condition. We also abort if the rel has no eclass joins as such a rel
+	 * could well be joined using some operator which is not an equality
+	 * operator, or the rel may not even be inner joined at all.
+	 *
+	 * Here we actually only check if the rel has any indexes, ideally we'd be
+	 * checking for unique indexes, but we could only determine that by looping
+	 * over the indexlist, and this is likely too expensive a check to be worth
+	 * it here.
+	 */
+	if (removalrel->reloptkind != RELOPT_BASEREL ||
+		removalrel->rtekind != RTE_RELATION ||
+		removalrel->has_eclass_joins == false ||
+		removalrel->indexlist == NIL)
+		return false;
+
+	/*
+	 * Currently we disallow the removal if we find any baserestrictinfo items
+	 * on the relation being removed. The reason for this is that these would
+	 * filter out rows and make it so the foreign key cannot prove that we'll
+	 * match exactly 1 row on the join condition. However, this check is
+	 * currently probably a bit overly strict as it should be possible to just
+	 * check and ensure that each Var seen in the baserestrictinfo is also
+	 * present in an eclass and if so, just translate and move the whole
+	 * baserestrictinfo over to the relation which has the foreign key to prove
+	 * that this join is not needed. e.g:
+	 * SELECT a.* FROM a INNER JOIN b ON a.b_id = b.id WHERE b.id = 1;
+	 * could become: SELECT a.* FROM a WHERE a.b_id = 1;
+	 */
+	if (removalrel->baserestrictinfo != NIL)
+		return false;
+
+	/*
+	 * Currently only eclass joins are supported, so if there are any non
+	 * eclass join quals then we'll report the join is non-removable.
+	 */
+	if (removalrel->joininfo != NIL)
+		return false;
+
+	/*
+	 * Now we'll search through each relation in the joinlist to see if we can
+	 * find a relation which has a foreign key which references removalrel on
+	 * the join condition. If we find a rel with a foreign key which matches
+	 * the join condition exactly, then we can be sure that exactly 1 row will
+	 * be matched on the join, if we also see that no Vars from the relation
+	 * are needed, then we can report the join as removable.
+	 */
+	foreach (lc, joinlist)
+	{
+		RangeTblRef	*rtr = (RangeTblRef *) lfirst(lc);
+		RelOptInfo	*rel;
+		ListCell	*lc2;
+		List		*referencing_vars;
+		List		*index_vars;
+		List		*operator_list;
+		Relids		 joinrelids;
+
+		/* we can't remove ourself, or anything other than RangeTblRefs */
+		if (rtr == removalrtr || !IsA(rtr, RangeTblRef))
+			continue;
+
+		rel = find_base_rel(root, rtr->rtindex);
+
+		/*
+		 * The only relation type that can help us is a base rel with at least
+		 * one foreign key defined, if there's no eclass joins then this rel
+		 * is not going to help us prove the removalrel is not needed.
+		 */
+		if (rel->reloptkind != RELOPT_BASEREL ||
+			rel->rtekind != RTE_RELATION ||
+			rel->has_eclass_joins == false ||
+			rel->fklist == NIL)
+			continue;
+
+		/*
+		 * Both rels have eclass joins, but do they have eclass joins to each
+		 * other? Skip this rel if it does not.
+		 */
+		if (!have_relevant_eclass_joinclause(root, rel, removalrel))
+			continue;
+
+		joinrelids = bms_union(rel->relids, removalrel->relids);
+
+		/* if any of the Vars from the relation are needed then abort */
+		if (relation_is_needed(root, joinrelids, removalrel, ignoredrels))
+			return false;
+
+		referencing_vars = NIL;
+		index_vars = NIL;
+		operator_list = NIL;
+
+		/* now populate the lists with the join condition Vars */
+		foreach(lc2, root->eq_classes)
+		{
+			EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc2);
+
+			if (list_length(ec->ec_members) <= 1)
+				continue;
+
+			if (bms_overlap(removalrel->relids, ec->ec_relids) &&
+				bms_overlap(rel->relids, ec->ec_relids))
+			{
+				ListCell *lc3;
+				Var *refvar = NULL;
+				Var *idxvar = NULL;
+
+				/*
+				 * Look at each member of the eclass and try to find a Var from
+				 * each side of the join that we can append to the list of
+				 * columns that should be checked against each foreign key.
+				 *
+				 * The following logic does not allow for join removals to take
+				 * place for foreign keys that have duplicate columns on the
+				 * referencing side of the foreign key, such as:
+				 * (a,a) references (x,y)
+				 * The use case for such a foreign key is likely small enough
+				 * that we needn't bother making this code anymore complex to
+				 * solve. If we find more than 1 Var from any of the rels then
+				 * we'll bail out.
+				 */
+				foreach (lc3, ec->ec_members)
+				{
+					EquivalenceMember *ecm = (EquivalenceMember *) lfirst(lc3);
+
+					Var *var = (Var *) ecm->em_expr;
+
+					if (!IsA(var, Var))
+						continue; /* Ignore Consts */
+
+					if (var->varno == rel->relid)
+					{
+						if (refvar != NULL)
+							return false;
+						refvar = var;
+					}
+
+					else if (var->varno == removalrel->relid)
+					{
+						if (idxvar != NULL)
+							return false;
+						idxvar = var;
+					}
+				}
+
+				if (refvar != NULL && idxvar != NULL)
+				{
+					Oid opno;
+					Oid reloid = root->simple_rte_array[refvar->varno]->relid;
+
+					if (!get_attnotnull(reloid, refvar->varattno))
+						return false;
+
+					/* grab the correct equality operator for these two vars */
+					opno = select_equality_operator(ec, refvar->vartype, idxvar->vartype);
+
+					if (!OidIsValid(opno))
+						return false;
+
+					referencing_vars = lappend(referencing_vars, refvar);
+					index_vars = lappend(index_vars, idxvar);
+					operator_list = lappend_oid(operator_list, opno);
+				}
+			}
+		}
+
+		if (referencing_vars != NULL)
+		{
+			if (relation_has_foreign_key_for(root, rel, removalrel,
+				referencing_vars, index_vars, operator_list))
+				return true; /* removalrel can be removed */
+		}
+	}
+
+	return false; /* can't remove join */
+}
+
+/*
+ * leftjoin_is_removable
+ *	  Check whether we need not perform this left join at all, because
  *	  it will just duplicate its left input.
  *
  * This is true for a left join for which the join condition cannot match
@@ -147,7 +434,7 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
  * above the join.
  */
 static bool
-join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 {
 	int			innerrelid;
 	RelOptInfo *innerrel;
@@ -155,14 +442,14 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	Relids		joinrelids;
 	List	   *clause_list = NIL;
 	ListCell   *l;
-	int			attroff;
+
+	Assert(sjinfo->jointype == JOIN_LEFT);
 
 	/*
-	 * Must be a non-delaying left join to a single baserel, else we aren't
+	 * Must be a non-delaying join to a single baserel, else we aren't
 	 * going to be able to do anything with it.
 	 */
-	if (sjinfo->jointype != JOIN_LEFT ||
-		sjinfo->delay_upper_joins)
+	if (sjinfo->delay_upper_joins)
 		return false;
 
 	if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
@@ -206,52 +493,9 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
-	/*
-	 * We can't remove the join if any inner-rel attributes are used above the
-	 * join.
-	 *
-	 * Note that this test only detects use of inner-rel attributes in higher
-	 * join conditions and the target list.  There might be such attributes in
-	 * pushed-down conditions at this join, too.  We check that case below.
-	 *
-	 * As a micro-optimization, it seems better to start with max_attr and
-	 * count down rather than starting with min_attr and counting up, on the
-	 * theory that the system attributes are somewhat less likely to be wanted
-	 * and should be tested last.
-	 */
-	for (attroff = innerrel->max_attr - innerrel->min_attr;
-		 attroff >= 0;
-		 attroff--)
-	{
-		if (!bms_is_subset(innerrel->attr_needed[attroff], joinrelids))
-			return false;
-	}
-
-	/*
-	 * Similarly check that the inner rel isn't needed by any PlaceHolderVars
-	 * that will be used above the join.  We only need to fail if such a PHV
-	 * actually references some inner-rel attributes; but the correct check
-	 * for that is relatively expensive, so we first check against ph_eval_at,
-	 * which must mention the inner rel if the PHV uses any inner-rel attrs as
-	 * non-lateral references.  Note that if the PHV's syntactic scope is just
-	 * the inner rel, we can't drop the rel even if the PHV is variable-free.
-	 */
-	foreach(l, root->placeholder_list)
-	{
-		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
-
-		if (bms_is_subset(phinfo->ph_needed, joinrelids))
-			continue;			/* PHV is not used above the join */
-		if (bms_overlap(phinfo->ph_lateral, innerrel->relids))
-			return false;		/* it references innerrel laterally */
-		if (!bms_overlap(phinfo->ph_eval_at, innerrel->relids))
-			continue;			/* it definitely doesn't reference innerrel */
-		if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids))
-			return false;		/* there isn't any other place to eval PHV */
-		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
-						innerrel->relids))
-			return false;		/* it does reference innerrel */
-	}
+	/* if the relation is referenced in the query then it cannot be removed */
+	if (relation_is_needed(root, joinrelids, innerrel, NULL))
+		return false;
 
 	/*
 	 * Search for mergejoinable clauses that constrain the inner rel against
@@ -368,6 +612,218 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	return false;
 }
 
+/*
+ * relation_is_needed
+ *		True if any of the Vars from this relation are required in the query
+ */
+static inline bool
+relation_is_needed(PlannerInfo *root, Relids joinrelids, RelOptInfo *rel, Relids ignoredrels)
+{
+	int		  attroff;
+	ListCell *l;
+
+	/*
+	 * rel is referenced if any of it's attributes are used above the join.
+	 *
+	 * Note that this test only detects use of rel's attributes in higher
+	 * join conditions and the target list.  There might be such attributes in
+	 * pushed-down conditions at this join, too.  We check that case below.
+	 *
+	 * As a micro-optimization, it seems better to start with max_attr and
+	 * count down rather than starting with min_attr and counting up, on the
+	 * theory that the system attributes are somewhat less likely to be wanted
+	 * and should be tested last.
+	 */
+	for (attroff = rel->max_attr - rel->min_attr;
+		 attroff >= 0;
+		 attroff--)
+	{
+		if (!bms_is_subset(bms_difference(rel->attr_needed[attroff], ignoredrels), joinrelids))
+			return true;
+	}
+
+	/*
+	 * Similarly check that rel isn't needed by any PlaceHolderVars that will
+	 * be used above the join.  We only need to fail if such a PHV actually
+	 * references some of rel's attributes; but the correct check for that is
+	 * relatively expensive, so we first check against ph_eval_at, which must
+	 * mention rel if the PHV uses any of-rel's attrs as non-lateral
+	 * references.  Note that if the PHV's syntactic scope is just rel, we
+	 * can't return true even if the PHV is variable-free.
+	 */
+	foreach(l, root->placeholder_list)
+	{
+		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
+
+		if (bms_is_subset(phinfo->ph_needed, joinrelids))
+			continue;			/* PHV is not used above the join */
+		if (bms_overlap(phinfo->ph_lateral, rel->relids))
+			return true;		/* it references rel laterally */
+		if (!bms_overlap(phinfo->ph_eval_at, rel->relids))
+			continue;			/* it definitely doesn't reference rel */
+		if (bms_is_subset(phinfo->ph_eval_at, rel->relids))
+			return true;		/* there isn't any other place to eval PHV */
+		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
+						rel->relids))
+			return true;		/* it does reference rel */
+	}
+
+	return false; /* it does not reference rel */
+}
+
+/*
+ * relation_has_foreign_key_for
+ *	  Checks if rel has a foreign key which references referencedrel with the
+ *	  given list of expressions.
+ *
+ *	For the match to succeed:
+ *	  referencing_vars must match the columns defined in the foreign key.
+ *	  index_vars must match the columns defined in the index for the foreign key.
+ */
+static bool
+relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+			RelOptInfo *referencedrel, List *referencing_vars,
+			List *index_vars, List *operator_list)
+{
+	ListCell *lc;
+	Oid		  refreloid;
+
+	/*
+	 * Look up the Oid of the referenced relation. We only want to look at
+	 * foreign keys on the referencing relation which reference this relation.
+	 */
+	refreloid = root->simple_rte_array[referencedrel->relid]->relid;
+
+	Assert(list_length(referencing_vars) > 0);
+	Assert(list_length(referencing_vars) == list_length(index_vars));
+	Assert(list_length(referencing_vars) == list_length(operator_list));
+
+	/*
+	 * Search through each foreign key on the referencing relation and try
+	 * to find one which references the relation in the join condition. If we
+	 * find one then we'll send the join conditions off to
+	 * expressions_match_foreign_key() to see if they match the foreign key.
+	 */
+	foreach(lc, rel->fklist)
+	{
+		ForeignKeyInfo *fk = (ForeignKeyInfo *) lfirst(lc);
+
+		if (fk->confrelid == refreloid)
+		{
+			if (expressions_match_foreign_key(fk, referencing_vars,
+				index_vars, operator_list))
+				return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * expressions_match_foreign_key
+ *		True if the given fkvars, indexvars and operators will match
+ *		exactly 1 record in the referenced relation of the foreign key.
+ *
+ * Note: This function expects fkvars and indexvars to only contain Var types.
+ *		 Expression indexes are not supported by foreign keys.
+ */
+static bool
+expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					List *indexvars, List *operators)
+{
+	ListCell  *lc;
+	ListCell  *lc2;
+	ListCell  *lc3;
+	Bitmapset *allitems;
+	Bitmapset *matcheditems;
+	int		   lstidx;
+	int		   col;
+
+	Assert(list_length(fkvars) == list_length(indexvars));
+	Assert(list_length(fkvars) == list_length(operators));
+
+	/*
+	 * Fast path out if there's not enough conditions to match each column in
+	 * the foreign key. Note that we cannot check that the number of
+	 * expressions are equal here since it would cause any expressions which
+	 * are duplicated not to match.
+	 */
+	if (list_length(fkvars) < fk->conncols)
+		return false;
+
+	/*
+	 * We need to ensure that each foreign key column can be matched to a list
+	 * item, and we need to ensure that each list item can be matched to a
+	 * foreign key column. We do this by looping over each foreign key column
+	 * and checking that we can find an item in the list which matches the
+	 * current column, however this method does not allow us to ensure that no
+	 * additional items exist in the list. We could solve that by performing
+	 * another loop over each list item and check that it matches a foreign key
+	 * column, but that's a bit wasteful. Instead we'll use 2 bitmapsets, one
+	 * to store the 0 based index of each list item, and with the other we'll
+	 * store each list index that we've managed to match. After we're done
+	 * matching we'll just make sure that both bitmapsets are equal.
+	 */
+	allitems = NULL;
+	matcheditems = NULL;
+
+	/*
+	 * Build a bitmapset which contains each 1 based list index. It seems more
+	 * efficient to do this in reverse so that we allocate enough memory for
+	 * the bitmapset on first loop rather than reallocating each time we find
+	 * we need a bit more space.
+	 */
+	for (lstidx = list_length(fkvars) - 1; lstidx >= 0; lstidx--)
+		allitems = bms_add_member(allitems, lstidx);
+
+	for (col = 0; col < fk->conncols; col++)
+	{
+		bool  matched = false;
+
+		lstidx = 0;
+
+		forthree(lc, fkvars, lc2, indexvars, lc3, operators)
+		{
+			Var *expr = (Var *) lfirst(lc);
+			Var *idxexpr = (Var *) lfirst(lc2);
+			Oid  opr = lfirst_oid(lc3);
+
+			Assert(IsA(expr, Var));
+			Assert(IsA(idxexpr, Var));
+
+			/* Does this join qual match up to the current fkey column? */
+			if (fk->conkey[col] == expr->varattno &&
+				fk->confkey[col] == idxexpr->varattno &&
+				equality_ops_are_compatible(opr, fk->conpfeqop[col]))
+			{
+				matched = true;
+
+				/* mark this list item as matched */
+				matcheditems = bms_add_member(matcheditems, lstidx);
+
+				/*
+				 * Don't break here as there may be duplicate expressions
+				 * that we also need to match against.
+				 */
+			}
+			lstidx++;
+		}
+
+		/* punt if there's no match. */
+		if (!matched)
+			return false;
+	}
+
+	/*
+	 * Ensure that we managed to match every item in the list to a foreign key
+	 * column.
+	 */
+	if (!bms_equal(allitems, matcheditems))
+		return false;
+
+	return true; /* matched */
+}
+
 
 /*
  * Remove the target relid from the planner's data structures, having
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index bf8dbe0..19333e6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4675,6 +4675,15 @@ make_lockrows(Plan *lefttree, List *rowMarks, int epqParam)
 	return node;
 }
 
+AlternativePlan *
+make_alternativeplan(List *planlist)
+{
+	AlternativePlan *node = makeNode(AlternativePlan);
+	node->planList = planlist;
+
+	return node;
+}
+
 /*
  * Note: offset_est and count_est are passed in to save having to repeat
  * work already done to estimate the values of the limitOffset and limitCount
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 94ca92d..866e0c5 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -409,6 +409,7 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	Path	   *sorted_path;
 	Cost		path_cost;
 	double		path_fraction;
+	List	   *final_rel_list;
 
 	/*----------
 	 * Generate modified query of the form
@@ -478,8 +479,12 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	subroot->tuple_fraction = 1.0;
 	subroot->limit_tuples = 1.0;
 
-	final_rel = query_planner(subroot, parse->targetList,
-							  minmax_qp_callback, NULL);
+	final_rel_list = query_planner(subroot, parse->targetList,
+							  minmax_qp_callback, NULL, false);
+
+	Assert(list_length(final_rel_list) ==  1);
+
+	final_rel = (RelOptInfo *) linitial(final_rel_list);
 
 	/*
 	 * Get the best presorted path, that being the one that's cheapest for
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 93484a0..964371f 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -51,15 +51,17 @@
  * qp_callback once we have completed merging the query's equivalence classes.
  * (We cannot construct canonical pathkeys until that's done.)
  */
-RelOptInfo *
+List *
 query_planner(PlannerInfo *root, List *tlist,
-			  query_pathkeys_callback qp_callback, void *qp_extra)
+			  query_pathkeys_callback qp_callback, void *qp_extra,
+			  bool alternativePlans)
 {
 	Query	   *parse = root->parse;
 	List	   *joinlist;
 	RelOptInfo *final_rel;
 	Index		rti;
 	double		total_pages;
+	List	   *finalrellist = NIL;
 
 	/*
 	 * If the query has an empty join tree, then it's something easy like
@@ -84,7 +86,7 @@ query_planner(PlannerInfo *root, List *tlist,
 		root->canon_pathkeys = NIL;
 		(*qp_callback) (root, qp_extra);
 
-		return final_rel;
+		return lappend(NIL, final_rel);
 	}
 
 	/*
@@ -231,14 +233,33 @@ query_planner(PlannerInfo *root, List *tlist,
 	root->total_table_pages = total_pages;
 
 	/*
-	 * Ready to do the primary planning.
+	 * if we've marked some join types are removable then since we won't know
+	 * until execution time if the relations can be removed, then we'd better
+	 * generate both plans and leave it up to the executor which one gets used
 	 */
-	final_rel = make_one_rel(root, joinlist);
+	if (root->glob->suitableFor != PLAN_SUITABILITY_ALL_PURPOSE
+		&& alternativePlans == true)
+	{
+		/* Generate fully optimized plan, with all removable joins removed */
+		final_rel = make_one_rel(root, joinlist, root->glob->suitableFor);
+
+		/* Check that we got at least one usable path */
+		if (!final_rel || !final_rel->cheapest_total_path ||
+			final_rel->cheapest_total_path->param_info != NULL)
+			elog(ERROR, "failed to construct the join relation");
+
+		finalrellist = lappend(finalrellist, final_rel);
+	}
+
+	/* generate an all purpose plan */
+	final_rel = make_one_rel(root, joinlist, PLAN_SUITABILITY_ALL_PURPOSE);
 
 	/* Check that we got at least one usable path */
 	if (!final_rel || !final_rel->cheapest_total_path ||
 		final_rel->cheapest_total_path->param_info != NULL)
 		elog(ERROR, "failed to construct the join relation");
 
-	return final_rel;
+	finalrellist = lappend(finalrellist, final_rel);
+
+	return finalrellist;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f752ecc..c8f3a3e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -178,6 +178,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	glob->lastRowMarkId = 0;
 	glob->transientPlan = false;
 	glob->hasRowSecurity = false;
+	glob->suitableFor = PLAN_SUITABILITY_ALL_PURPOSE;
 
 	/* Determine what fraction of the plan is likely to be scanned */
 	if (cursorOptions & CURSOR_OPT_FAST_PLAN)
@@ -256,6 +257,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	result->invalItems = glob->invalItems;
 	result->nParamExec = glob->nParamExec;
 	result->hasRowSecurity = glob->hasRowSecurity;
+	result->suitableFor = glob->suitableFor;
 
 	return result;
 }
@@ -1087,10 +1089,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	int64		count_est = 0;
 	double		limit_tuples = -1.0;
 	Plan	   *result_plan;
+	List	   *result_plan_list = NIL;
 	List	   *current_pathkeys;
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	ListCell   *lc;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1169,6 +1173,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		root->sort_pathkeys = make_pathkeys_for_sortclauses(root,
 															parse->sortClause,
 															tlist);
+
+		result_plan_list = lappend(NIL, result_plan);
 	}
 	else
 	{
@@ -1178,6 +1184,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		bool		need_tlist_eval = true;
 		standard_qp_extra qp_extra;
 		RelOptInfo *final_rel;
+		List	   *final_rel_list;
 		Path	   *cheapest_path;
 		Path	   *sorted_path;
 		Path	   *best_path;
@@ -1288,710 +1295,723 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		 * standard_qp_callback) pathkey representations of the query's sort
 		 * clause, distinct clause, etc.
 		 */
-		final_rel = query_planner(root, sub_tlist,
-								  standard_qp_callback, &qp_extra);
-
-		/*
-		 * Extract rowcount and width estimates for use below.
-		 */
-		path_rows = final_rel->rows;
-		path_width = final_rel->width;
+		final_rel_list = query_planner(root, sub_tlist,
+							  standard_qp_callback, &qp_extra, true);
 
-		/*
-		 * If there's grouping going on, estimate the number of result groups.
-		 * We couldn't do this any earlier because it depends on relation size
-		 * estimates that are created within query_planner().
-		 *
-		 * Then convert tuple_fraction to fractional form if it is absolute,
-		 * and if grouping or aggregation is involved, adjust tuple_fraction
-		 * to describe the fraction of the underlying un-aggregated tuples
-		 * that will be fetched.
-		 */
-		dNumGroups = 1;			/* in case not grouping */
-
-		if (parse->groupClause)
+		foreach(lc, final_rel_list)
 		{
-			List	   *groupExprs;
-
-			groupExprs = get_sortgrouplist_exprs(parse->groupClause,
-												 parse->targetList);
-			dNumGroups = estimate_num_groups(root, groupExprs, path_rows);
-
+			final_rel = (RelOptInfo *) lfirst(lc);
 			/*
-			 * In GROUP BY mode, an absolute LIMIT is relative to the number
-			 * of groups not the number of tuples.  If the caller gave us a
-			 * fraction, keep it as-is.  (In both cases, we are effectively
-			 * assuming that all the groups are about the same size.)
+			 * Extract rowcount and width estimates for use below.
 			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= dNumGroups;
+			path_rows = final_rel->rows;
+			path_width = final_rel->width;
 
 			/*
-			 * If both GROUP BY and ORDER BY are specified, we will need two
-			 * levels of sort --- and, therefore, certainly need to read all
-			 * the tuples --- unless ORDER BY is a subset of GROUP BY.
-			 * Likewise if we have both DISTINCT and GROUP BY, or if we have a
-			 * window specification not compatible with the GROUP BY.
-			 */
-			if (!pathkeys_contained_in(root->sort_pathkeys,
-									   root->group_pathkeys) ||
-				!pathkeys_contained_in(root->distinct_pathkeys,
-									   root->group_pathkeys) ||
-				!pathkeys_contained_in(root->window_pathkeys,
-									   root->group_pathkeys))
-				tuple_fraction = 0.0;
-		}
-		else if (parse->hasAggs || root->hasHavingQual)
-		{
-			/*
-			 * Ungrouped aggregate will certainly want to read all the tuples,
-			 * and it will deliver a single result row (so leave dNumGroups
-			 * set to 1).
-			 */
-			tuple_fraction = 0.0;
-		}
-		else if (parse->distinctClause)
-		{
-			/*
-			 * Since there was no grouping or aggregation, it's reasonable to
-			 * assume the UNIQUE filter has effects comparable to GROUP BY.
-			 * (If DISTINCT is used with grouping, we ignore its effects for
-			 * rowcount estimation purposes; this amounts to assuming the
-			 * grouped rows are distinct already.)
-			 */
-			List	   *distinctExprs;
-
-			distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-													parse->targetList);
-			dNumGroups = estimate_num_groups(root, distinctExprs, path_rows);
-
-			/*
-			 * Adjust tuple_fraction the same way as for GROUP BY, too.
-			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= dNumGroups;
-		}
-		else
-		{
-			/*
-			 * Plain non-grouped, non-aggregated query: an absolute tuple
-			 * fraction can be divided by the number of tuples.
+			 * If there's grouping going on, estimate the number of result groups.
+			 * We couldn't do this any earlier because it depends on relation size
+			 * estimates that are created within query_planner().
+			 *
+			 * Then convert tuple_fraction to fractional form if it is absolute,
+			 * and if grouping or aggregation is involved, adjust tuple_fraction
+			 * to describe the fraction of the underlying un-aggregated tuples
+			 * that will be fetched.
 			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= path_rows;
-		}
+			dNumGroups = 1;			/* in case not grouping */
 
-		/*
-		 * Pick out the cheapest-total path as well as the cheapest presorted
-		 * path for the requested pathkeys (if there is one).  We should take
-		 * the tuple fraction into account when selecting the cheapest
-		 * presorted path, but not when selecting the cheapest-total path,
-		 * since if we have to sort then we'll have to fetch all the tuples.
-		 * (But there's a special case: if query_pathkeys is NIL, meaning
-		 * order doesn't matter, then the "cheapest presorted" path will be
-		 * the cheapest overall for the tuple fraction.)
-		 */
-		cheapest_path = final_rel->cheapest_total_path;
-
-		sorted_path =
-			get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
-													  root->query_pathkeys,
-													  NULL,
-													  tuple_fraction);
+			if (parse->groupClause)
+			{
+				List	   *groupExprs;
 
-		/* Don't consider same path in both guises; just wastes effort */
-		if (sorted_path == cheapest_path)
-			sorted_path = NULL;
+				groupExprs = get_sortgrouplist_exprs(parse->groupClause,
+													 parse->targetList);
+				dNumGroups = estimate_num_groups(root, groupExprs, path_rows);
 
-		/*
-		 * Forget about the presorted path if it would be cheaper to sort the
-		 * cheapest-total path.  Here we need consider only the behavior at
-		 * the tuple_fraction point.  Also, limit_tuples is only relevant if
-		 * not grouping/aggregating, so use root->limit_tuples in the
-		 * cost_sort call.
-		 */
-		if (sorted_path)
-		{
-			Path		sort_path;		/* dummy for result of cost_sort */
+				/*
+				 * In GROUP BY mode, an absolute LIMIT is relative to the number
+				 * of groups not the number of tuples.  If the caller gave us a
+				 * fraction, keep it as-is.  (In both cases, we are effectively
+				 * assuming that all the groups are about the same size.)
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= dNumGroups;
 
-			if (root->query_pathkeys == NIL ||
-				pathkeys_contained_in(root->query_pathkeys,
-									  cheapest_path->pathkeys))
-			{
-				/* No sort needed for cheapest path */
-				sort_path.startup_cost = cheapest_path->startup_cost;
-				sort_path.total_cost = cheapest_path->total_cost;
+				/*
+				 * If both GROUP BY and ORDER BY are specified, we will need two
+				 * levels of sort --- and, therefore, certainly need to read all
+				 * the tuples --- unless ORDER BY is a subset of GROUP BY.
+				 * Likewise if we have both DISTINCT and GROUP BY, or if we have a
+				 * window specification not compatible with the GROUP BY.
+				 */
+				if (!pathkeys_contained_in(root->sort_pathkeys,
+										   root->group_pathkeys) ||
+					!pathkeys_contained_in(root->distinct_pathkeys,
+										   root->group_pathkeys) ||
+					!pathkeys_contained_in(root->window_pathkeys,
+										   root->group_pathkeys))
+					tuple_fraction = 0.0;
 			}
-			else
+			else if (parse->hasAggs || root->hasHavingQual)
 			{
-				/* Figure cost for sorting */
-				cost_sort(&sort_path, root, root->query_pathkeys,
-						  cheapest_path->total_cost,
-						  path_rows, path_width,
-						  0.0, work_mem, root->limit_tuples);
+				/*
+				 * Ungrouped aggregate will certainly want to read all the tuples,
+				 * and it will deliver a single result row (so leave dNumGroups
+				 * set to 1).
+				 */
+				tuple_fraction = 0.0;
 			}
-
-			if (compare_fractional_path_costs(sorted_path, &sort_path,
-											  tuple_fraction) > 0)
+			else if (parse->distinctClause)
 			{
-				/* Presorted path is a loser */
-				sorted_path = NULL;
-			}
-		}
+				/*
+				 * Since there was no grouping or aggregation, it's reasonable to
+				 * assume the UNIQUE filter has effects comparable to GROUP BY.
+				 * (If DISTINCT is used with grouping, we ignore its effects for
+				 * rowcount estimation purposes; this amounts to assuming the
+				 * grouped rows are distinct already.)
+				 */
+				List	   *distinctExprs;
 
-		/*
-		 * Consider whether we want to use hashing instead of sorting.
-		 */
-		if (parse->groupClause)
-		{
-			/*
-			 * If grouping, decide whether to use sorted or hashed grouping.
-			 */
-			use_hashed_grouping =
-				choose_hashed_grouping(root,
-									   tuple_fraction, limit_tuples,
-									   path_rows, path_width,
-									   cheapest_path, sorted_path,
-									   dNumGroups, &agg_costs);
-			/* Also convert # groups to long int --- but 'ware overflow! */
-			numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
-		}
-		else if (parse->distinctClause && sorted_path &&
-				 !root->hasHavingQual && !parse->hasAggs && !activeWindows)
-		{
-			/*
-			 * We'll reach the DISTINCT stage without any intermediate
-			 * processing, so figure out whether we will want to hash or not
-			 * so we can choose whether to use cheapest or sorted path.
-			 */
-			use_hashed_distinct =
-				choose_hashed_distinct(root,
-									   tuple_fraction, limit_tuples,
-									   path_rows, path_width,
-									   cheapest_path->startup_cost,
-									   cheapest_path->total_cost,
-									   sorted_path->startup_cost,
-									   sorted_path->total_cost,
-									   sorted_path->pathkeys,
-									   dNumGroups);
-			tested_hashed_distinct = true;
-		}
+				distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
+														parse->targetList);
+				dNumGroups = estimate_num_groups(root, distinctExprs, path_rows);
 
-		/*
-		 * Select the best path.  If we are doing hashed grouping, we will
-		 * always read all the input tuples, so use the cheapest-total path.
-		 * Otherwise, the comparison above is correct.
-		 */
-		if (use_hashed_grouping || use_hashed_distinct || !sorted_path)
-			best_path = cheapest_path;
-		else
-			best_path = sorted_path;
+				/*
+				 * Adjust tuple_fraction the same way as for GROUP BY, too.
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= dNumGroups;
+			}
+			else
+			{
+				/*
+				 * Plain non-grouped, non-aggregated query: an absolute tuple
+				 * fraction can be divided by the number of tuples.
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= path_rows;
+			}
 
-		/*
-		 * Check to see if it's possible to optimize MIN/MAX aggregates. If
-		 * so, we will forget all the work we did so far to choose a "regular"
-		 * path ... but we had to do it anyway to be able to tell which way is
-		 * cheaper.
-		 */
-		result_plan = optimize_minmax_aggregates(root,
-												 tlist,
-												 &agg_costs,
-												 best_path);
-		if (result_plan != NULL)
-		{
-			/*
-			 * optimize_minmax_aggregates generated the full plan, with the
-			 * right tlist, and it has no sort order.
-			 */
-			current_pathkeys = NIL;
-		}
-		else
-		{
 			/*
-			 * Normal case --- create a plan according to query_planner's
-			 * results.
+			 * Pick out the cheapest-total path as well as the cheapest presorted
+			 * path for the requested pathkeys (if there is one).  We should take
+			 * the tuple fraction into account when selecting the cheapest
+			 * presorted path, but not when selecting the cheapest-total path,
+			 * since if we have to sort then we'll have to fetch all the tuples.
+			 * (But there's a special case: if query_pathkeys is NIL, meaning
+			 * order doesn't matter, then the "cheapest presorted" path will be
+			 * the cheapest overall for the tuple fraction.)
 			 */
-			bool		need_sort_for_grouping = false;
+			cheapest_path = final_rel->cheapest_total_path;
 
-			result_plan = create_plan(root, best_path);
-			current_pathkeys = best_path->pathkeys;
+			sorted_path =
+				get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+														  root->query_pathkeys,
+														  NULL,
+														  tuple_fraction);
 
-			/* Detect if we'll need an explicit sort for grouping */
-			if (parse->groupClause && !use_hashed_grouping &&
-			  !pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
-			{
-				need_sort_for_grouping = true;
-
-				/*
-				 * Always override create_plan's tlist, so that we don't sort
-				 * useless data from a "physical" tlist.
-				 */
-				need_tlist_eval = true;
-			}
+			/* Don't consider same path in both guises; just wastes effort */
+			if (sorted_path == cheapest_path)
+				sorted_path = NULL;
 
 			/*
-			 * create_plan returns a plan with just a "flat" tlist of required
-			 * Vars.  Usually we need to insert the sub_tlist as the tlist of
-			 * the top plan node.  However, we can skip that if we determined
-			 * that whatever create_plan chose to return will be good enough.
+			 * Forget about the presorted path if it would be cheaper to sort the
+			 * cheapest-total path.  Here we need consider only the behavior at
+			 * the tuple_fraction point.  Also, limit_tuples is only relevant if
+			 * not grouping/aggregating, so use root->limit_tuples in the
+			 * cost_sort call.
 			 */
-			if (need_tlist_eval)
+			if (sorted_path)
 			{
-				/*
-				 * If the top-level plan node is one that cannot do expression
-				 * evaluation and its existing target list isn't already what
-				 * we need, we must insert a Result node to project the
-				 * desired tlist.
-				 */
-				if (!is_projection_capable_plan(result_plan) &&
-					!tlist_same_exprs(sub_tlist, result_plan->targetlist))
+				Path		sort_path;		/* dummy for result of cost_sort */
+
+				if (root->query_pathkeys == NIL ||
+					pathkeys_contained_in(root->query_pathkeys,
+										  cheapest_path->pathkeys))
 				{
-					result_plan = (Plan *) make_result(root,
-													   sub_tlist,
-													   NULL,
-													   result_plan);
+					/* No sort needed for cheapest path */
+					sort_path.startup_cost = cheapest_path->startup_cost;
+					sort_path.total_cost = cheapest_path->total_cost;
 				}
 				else
 				{
-					/*
-					 * Otherwise, just replace the subplan's flat tlist with
-					 * the desired tlist.
-					 */
-					result_plan->targetlist = sub_tlist;
+					/* Figure cost for sorting */
+					cost_sort(&sort_path, root, root->query_pathkeys,
+							  cheapest_path->total_cost,
+							  path_rows, path_width,
+							  0.0, work_mem, root->limit_tuples);
 				}
 
+				if (compare_fractional_path_costs(sorted_path, &sort_path,
+												  tuple_fraction) > 0)
+				{
+					/* Presorted path is a loser */
+					sorted_path = NULL;
+				}
+			}
+
+			/*
+			 * Consider whether we want to use hashing instead of sorting.
+			 */
+			if (parse->groupClause)
+			{
 				/*
-				 * Also, account for the cost of evaluation of the sub_tlist.
-				 * See comments for add_tlist_costs_to_plan() for more info.
+				 * If grouping, decide whether to use sorted or hashed grouping.
 				 */
-				add_tlist_costs_to_plan(root, result_plan, sub_tlist);
+				use_hashed_grouping =
+					choose_hashed_grouping(root,
+										   tuple_fraction, limit_tuples,
+										   path_rows, path_width,
+										   cheapest_path, sorted_path,
+										   dNumGroups, &agg_costs);
+				/* Also convert # groups to long int --- but 'ware overflow! */
+				numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 			}
-			else
+			else if (parse->distinctClause && sorted_path &&
+					 !root->hasHavingQual && !parse->hasAggs && !activeWindows)
 			{
 				/*
-				 * Since we're using create_plan's tlist and not the one
-				 * make_subplanTargetList calculated, we have to refigure any
-				 * grouping-column indexes make_subplanTargetList computed.
+				 * We'll reach the DISTINCT stage without any intermediate
+				 * processing, so figure out whether we will want to hash or not
+				 * so we can choose whether to use cheapest or sorted path.
 				 */
-				locate_grouping_columns(root, tlist, result_plan->targetlist,
-										groupColIdx);
+				use_hashed_distinct =
+					choose_hashed_distinct(root,
+										   tuple_fraction, limit_tuples,
+										   path_rows, path_width,
+										   cheapest_path->startup_cost,
+										   cheapest_path->total_cost,
+										   sorted_path->startup_cost,
+										   sorted_path->total_cost,
+										   sorted_path->pathkeys,
+										   dNumGroups);
+				tested_hashed_distinct = true;
 			}
 
 			/*
-			 * Insert AGG or GROUP node if needed, plus an explicit sort step
-			 * if necessary.
-			 *
-			 * HAVING clause, if any, becomes qual of the Agg or Group node.
+			 * Select the best path.  If we are doing hashed grouping, we will
+			 * always read all the input tuples, so use the cheapest-total path.
+			 * Otherwise, the comparison above is correct.
 			 */
-			if (use_hashed_grouping)
+			if (use_hashed_grouping || use_hashed_distinct || !sorted_path)
+				best_path = cheapest_path;
+			else
+				best_path = sorted_path;
+
+			/*
+			 * Check to see if it's possible to optimize MIN/MAX aggregates. If
+			 * so, we will forget all the work we did so far to choose a "regular"
+			 * path ... but we had to do it anyway to be able to tell which way is
+			 * cheaper.
+			 */
+			result_plan = optimize_minmax_aggregates(root,
+													 tlist,
+													 &agg_costs,
+													 best_path);
+			if (result_plan != NULL)
 			{
-				/* Hashed aggregate plan --- no sort needed */
-				result_plan = (Plan *) make_agg(root,
-												tlist,
-												(List *) parse->havingQual,
-												AGG_HASHED,
-												&agg_costs,
-												numGroupCols,
-												groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												numGroups,
-												result_plan);
-				/* Hashed aggregation produces randomly-ordered results */
+				/*
+				 * optimize_minmax_aggregates generated the full plan, with the
+				 * right tlist, and it has no sort order.
+				 */
 				current_pathkeys = NIL;
 			}
-			else if (parse->hasAggs)
+			else
 			{
-				/* Plain aggregate plan --- sort if needed */
-				AggStrategy aggstrategy;
+				/*
+				 * Normal case --- create a plan according to query_planner's
+				 * results.
+				 */
+				bool		need_sort_for_grouping = false;
+
+				result_plan = create_plan(root, best_path);
+				current_pathkeys = best_path->pathkeys;
 
-				if (parse->groupClause)
+				/* Detect if we'll need an explicit sort for grouping */
+				if (parse->groupClause && !use_hashed_grouping &&
+				  !pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
 				{
-					if (need_sort_for_grouping)
+					need_sort_for_grouping = true;
+
+					/*
+					 * Always override create_plan's tlist, so that we don't sort
+					 * useless data from a "physical" tlist.
+					 */
+					need_tlist_eval = true;
+				}
+
+				/*
+				 * create_plan returns a plan with just a "flat" tlist of required
+				 * Vars.  Usually we need to insert the sub_tlist as the tlist of
+				 * the top plan node.  However, we can skip that if we determined
+				 * that whatever create_plan chose to return will be good enough.
+				 */
+				if (need_tlist_eval)
+				{
+					/*
+					 * If the top-level plan node is one that cannot do expression
+					 * evaluation and its existing target list isn't already what
+					 * we need, we must insert a Result node to project the
+					 * desired tlist.
+					 */
+					if (!is_projection_capable_plan(result_plan) &&
+						!tlist_same_exprs(sub_tlist, result_plan->targetlist))
 					{
-						result_plan = (Plan *)
-							make_sort_from_groupcols(root,
-													 parse->groupClause,
-													 groupColIdx,
-													 result_plan);
-						current_pathkeys = root->group_pathkeys;
+						result_plan = (Plan *) make_result(root,
+														   sub_tlist,
+														   NULL,
+														   result_plan);
+					}
+					else
+					{
+						/*
+						 * Otherwise, just replace the subplan's flat tlist with
+						 * the desired tlist.
+						 */
+						result_plan->targetlist = sub_tlist;
 					}
-					aggstrategy = AGG_SORTED;
 
 					/*
-					 * The AGG node will not change the sort ordering of its
-					 * groups, so current_pathkeys describes the result too.
+					 * Also, account for the cost of evaluation of the sub_tlist.
+					 * See comments for add_tlist_costs_to_plan() for more info.
 					 */
+					add_tlist_costs_to_plan(root, result_plan, sub_tlist);
 				}
 				else
 				{
-					aggstrategy = AGG_PLAIN;
-					/* Result will be only one row anyway; no sort order */
-					current_pathkeys = NIL;
+					/*
+					 * Since we're using create_plan's tlist and not the one
+					 * make_subplanTargetList calculated, we have to refigure any
+					 * grouping-column indexes make_subplanTargetList computed.
+					 */
+					locate_grouping_columns(root, tlist, result_plan->targetlist,
+											groupColIdx);
 				}
 
-				result_plan = (Plan *) make_agg(root,
-												tlist,
-												(List *) parse->havingQual,
-												aggstrategy,
-												&agg_costs,
-												numGroupCols,
-												groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												numGroups,
-												result_plan);
-			}
-			else if (parse->groupClause)
-			{
 				/*
-				 * GROUP BY without aggregation, so insert a group node (plus
-				 * the appropriate sort node, if necessary).
+				 * Insert AGG or GROUP node if needed, plus an explicit sort step
+				 * if necessary.
 				 *
-				 * Add an explicit sort if we couldn't make the path come out
-				 * the way the GROUP node needs it.
+				 * HAVING clause, if any, becomes qual of the Agg or Group node.
 				 */
-				if (need_sort_for_grouping)
+				if (use_hashed_grouping)
 				{
-					result_plan = (Plan *)
-						make_sort_from_groupcols(root,
-												 parse->groupClause,
-												 groupColIdx,
-												 result_plan);
-					current_pathkeys = root->group_pathkeys;
+					/* Hashed aggregate plan --- no sort needed */
+					result_plan = (Plan *) make_agg(root,
+													tlist,
+													(List *) parse->havingQual,
+													AGG_HASHED,
+													&agg_costs,
+													numGroupCols,
+													groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													numGroups,
+													result_plan);
+					/* Hashed aggregation produces randomly-ordered results */
+					current_pathkeys = NIL;
 				}
+				else if (parse->hasAggs)
+				{
+					/* Plain aggregate plan --- sort if needed */
+					AggStrategy aggstrategy;
 
-				result_plan = (Plan *) make_group(root,
-												  tlist,
-												  (List *) parse->havingQual,
-												  numGroupCols,
-												  groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												  dNumGroups,
-												  result_plan);
-				/* The Group node won't change sort ordering */
-			}
-			else if (root->hasHavingQual)
-			{
-				/*
-				 * No aggregates, and no GROUP BY, but we have a HAVING qual.
-				 * This is a degenerate case in which we are supposed to emit
-				 * either 0 or 1 row depending on whether HAVING succeeds.
-				 * Furthermore, there cannot be any variables in either HAVING
-				 * or the targetlist, so we actually do not need the FROM
-				 * table at all!  We can just throw away the plan-so-far and
-				 * generate a Result node.  This is a sufficiently unusual
-				 * corner case that it's not worth contorting the structure of
-				 * this routine to avoid having to generate the plan in the
-				 * first place.
-				 */
-				result_plan = (Plan *) make_result(root,
-												   tlist,
-												   parse->havingQual,
-												   NULL);
-			}
-		}						/* end of non-minmax-aggregate case */
-
-		/*
-		 * Since each window function could require a different sort order, we
-		 * stack up a WindowAgg node for each window, with sort steps between
-		 * them as needed.
-		 */
-		if (activeWindows)
-		{
-			List	   *window_tlist;
-			ListCell   *l;
+					if (parse->groupClause)
+					{
+						if (need_sort_for_grouping)
+						{
+							result_plan = (Plan *)
+								make_sort_from_groupcols(root,
+														 parse->groupClause,
+														 groupColIdx,
+														 result_plan);
+							current_pathkeys = root->group_pathkeys;
+						}
+						aggstrategy = AGG_SORTED;
+
+						/*
+						 * The AGG node will not change the sort ordering of its
+						 * groups, so current_pathkeys describes the result too.
+						 */
+					}
+					else
+					{
+						aggstrategy = AGG_PLAIN;
+						/* Result will be only one row anyway; no sort order */
+						current_pathkeys = NIL;
+					}
 
-			/*
-			 * If the top-level plan node is one that cannot do expression
-			 * evaluation, we must insert a Result node to project the desired
-			 * tlist.  (In some cases this might not really be required, but
-			 * it's not worth trying to avoid it.  In particular, think not to
-			 * skip adding the Result if the initial window_tlist matches the
-			 * top-level plan node's output, because we might change the tlist
-			 * inside the following loop.)	Note that on second and subsequent
-			 * passes through the following loop, the top-level node will be a
-			 * WindowAgg which we know can project; so we only need to check
-			 * once.
-			 */
-			if (!is_projection_capable_plan(result_plan))
-			{
-				result_plan = (Plan *) make_result(root,
-												   NIL,
-												   NULL,
-												   result_plan);
-			}
+					result_plan = (Plan *) make_agg(root,
+													tlist,
+													(List *) parse->havingQual,
+													aggstrategy,
+													&agg_costs,
+													numGroupCols,
+													groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													numGroups,
+													result_plan);
+				}
+				else if (parse->groupClause)
+				{
+					/*
+					 * GROUP BY without aggregation, so insert a group node (plus
+					 * the appropriate sort node, if necessary).
+					 *
+					 * Add an explicit sort if we couldn't make the path come out
+					 * the way the GROUP node needs it.
+					 */
+					if (need_sort_for_grouping)
+					{
+						result_plan = (Plan *)
+							make_sort_from_groupcols(root,
+													 parse->groupClause,
+													 groupColIdx,
+													 result_plan);
+						current_pathkeys = root->group_pathkeys;
+					}
 
-			/*
-			 * The "base" targetlist for all steps of the windowing process is
-			 * a flat tlist of all Vars and Aggs needed in the result.  (In
-			 * some cases we wouldn't need to propagate all of these all the
-			 * way to the top, since they might only be needed as inputs to
-			 * WindowFuncs.  It's probably not worth trying to optimize that
-			 * though.)  We also add window partitioning and sorting
-			 * expressions to the base tlist, to ensure they're computed only
-			 * once at the bottom of the stack (that's critical for volatile
-			 * functions).  As we climb up the stack, we'll add outputs for
-			 * the WindowFuncs computed at each level.
-			 */
-			window_tlist = make_windowInputTargetList(root,
+					result_plan = (Plan *) make_group(root,
 													  tlist,
-													  activeWindows);
+													  (List *) parse->havingQual,
+													  numGroupCols,
+													  groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													  dNumGroups,
+													  result_plan);
+					/* The Group node won't change sort ordering */
+				}
+				else if (root->hasHavingQual)
+				{
+					/*
+					 * No aggregates, and no GROUP BY, but we have a HAVING qual.
+					 * This is a degenerate case in which we are supposed to emit
+					 * either 0 or 1 row depending on whether HAVING succeeds.
+					 * Furthermore, there cannot be any variables in either HAVING
+					 * or the targetlist, so we actually do not need the FROM
+					 * table at all!  We can just throw away the plan-so-far and
+					 * generate a Result node.  This is a sufficiently unusual
+					 * corner case that it's not worth contorting the structure of
+					 * this routine to avoid having to generate the plan in the
+					 * first place.
+					 */
+					result_plan = (Plan *) make_result(root,
+													   tlist,
+													   parse->havingQual,
+													   NULL);
+				}
+			}						/* end of non-minmax-aggregate case */
 
 			/*
-			 * The copyObject steps here are needed to ensure that each plan
-			 * node has a separately modifiable tlist.  (XXX wouldn't a
-			 * shallow list copy do for that?)
+			 * Since each window function could require a different sort order, we
+			 * stack up a WindowAgg node for each window, with sort steps between
+			 * them as needed.
 			 */
-			result_plan->targetlist = (List *) copyObject(window_tlist);
-
-			foreach(l, activeWindows)
+			if (activeWindows)
 			{
-				WindowClause *wc = (WindowClause *) lfirst(l);
-				List	   *window_pathkeys;
-				int			partNumCols;
-				AttrNumber *partColIdx;
-				Oid		   *partOperators;
-				int			ordNumCols;
-				AttrNumber *ordColIdx;
-				Oid		   *ordOperators;
-
-				window_pathkeys = make_pathkeys_for_window(root,
-														   wc,
-														   tlist);
+				List	   *window_tlist;
+				ListCell   *l;
 
 				/*
-				 * This is a bit tricky: we build a sort node even if we don't
-				 * really have to sort.  Even when no explicit sort is needed,
-				 * we need to have suitable resjunk items added to the input
-				 * plan's tlist for any partitioning or ordering columns that
-				 * aren't plain Vars.  (In theory, make_windowInputTargetList
-				 * should have provided all such columns, but let's not assume
-				 * that here.)	Furthermore, this way we can use existing
-				 * infrastructure to identify which input columns are the
-				 * interesting ones.
+				 * If the top-level plan node is one that cannot do expression
+				 * evaluation, we must insert a Result node to project the desired
+				 * tlist.  (In some cases this might not really be required, but
+				 * it's not worth trying to avoid it.  In particular, think not to
+				 * skip adding the Result if the initial window_tlist matches the
+				 * top-level plan node's output, because we might change the tlist
+				 * inside the following loop.)	Note that on second and subsequent
+				 * passes through the following loop, the top-level node will be a
+				 * WindowAgg which we know can project; so we only need to check
+				 * once.
 				 */
-				if (window_pathkeys)
-				{
-					Sort	   *sort_plan;
-
-					sort_plan = make_sort_from_pathkeys(root,
-														result_plan,
-														window_pathkeys,
-														-1.0);
-					if (!pathkeys_contained_in(window_pathkeys,
-											   current_pathkeys))
-					{
-						/* we do indeed need to sort */
-						result_plan = (Plan *) sort_plan;
-						current_pathkeys = window_pathkeys;
-					}
-					/* In either case, extract the per-column information */
-					get_column_info_for_window(root, wc, tlist,
-											   sort_plan->numCols,
-											   sort_plan->sortColIdx,
-											   &partNumCols,
-											   &partColIdx,
-											   &partOperators,
-											   &ordNumCols,
-											   &ordColIdx,
-											   &ordOperators);
-				}
-				else
+				if (!is_projection_capable_plan(result_plan))
 				{
-					/* empty window specification, nothing to sort */
-					partNumCols = 0;
-					partColIdx = NULL;
-					partOperators = NULL;
-					ordNumCols = 0;
-					ordColIdx = NULL;
-					ordOperators = NULL;
+					result_plan = (Plan *) make_result(root,
+													   NIL,
+													   NULL,
+													   result_plan);
 				}
 
-				if (lnext(l))
-				{
-					/* Add the current WindowFuncs to the running tlist */
-					window_tlist = add_to_flat_tlist(window_tlist,
-										   wflists->windowFuncs[wc->winref]);
-				}
-				else
+				/*
+				 * The "base" targetlist for all steps of the windowing process is
+				 * a flat tlist of all Vars and Aggs needed in the result.  (In
+				 * some cases we wouldn't need to propagate all of these all the
+				 * way to the top, since they might only be needed as inputs to
+				 * WindowFuncs.  It's probably not worth trying to optimize that
+				 * though.)  We also add window partitioning and sorting
+				 * expressions to the base tlist, to ensure they're computed only
+				 * once at the bottom of the stack (that's critical for volatile
+				 * functions).  As we climb up the stack, we'll add outputs for
+				 * the WindowFuncs computed at each level.
+				 */
+				window_tlist = make_windowInputTargetList(root,
+														  tlist,
+														  activeWindows);
+
+				/*
+				 * The copyObject steps here are needed to ensure that each plan
+				 * node has a separately modifiable tlist.  (XXX wouldn't a
+				 * shallow list copy do for that?)
+				 */
+				result_plan->targetlist = (List *) copyObject(window_tlist);
+
+				foreach(l, activeWindows)
 				{
-					/* Install the original tlist in the topmost WindowAgg */
-					window_tlist = tlist;
-				}
+					WindowClause *wc = (WindowClause *) lfirst(l);
+					List	   *window_pathkeys;
+					int			partNumCols;
+					AttrNumber *partColIdx;
+					Oid		   *partOperators;
+					int			ordNumCols;
+					AttrNumber *ordColIdx;
+					Oid		   *ordOperators;
+
+					window_pathkeys = make_pathkeys_for_window(root,
+															   wc,
+															   tlist);
+
+					/*
+					 * This is a bit tricky: we build a sort node even if we don't
+					 * really have to sort.  Even when no explicit sort is needed,
+					 * we need to have suitable resjunk items added to the input
+					 * plan's tlist for any partitioning or ordering columns that
+					 * aren't plain Vars.  (In theory, make_windowInputTargetList
+					 * should have provided all such columns, but let's not assume
+					 * that here.)	Furthermore, this way we can use existing
+					 * infrastructure to identify which input columns are the
+					 * interesting ones.
+					 */
+					if (window_pathkeys)
+					{
+						Sort	   *sort_plan;
+
+						sort_plan = make_sort_from_pathkeys(root,
+															result_plan,
+															window_pathkeys,
+															-1.0);
+						if (!pathkeys_contained_in(window_pathkeys,
+												   current_pathkeys))
+						{
+							/* we do indeed need to sort */
+							result_plan = (Plan *) sort_plan;
+							current_pathkeys = window_pathkeys;
+						}
+						/* In either case, extract the per-column information */
+						get_column_info_for_window(root, wc, tlist,
+												   sort_plan->numCols,
+												   sort_plan->sortColIdx,
+												   &partNumCols,
+												   &partColIdx,
+												   &partOperators,
+												   &ordNumCols,
+												   &ordColIdx,
+												   &ordOperators);
+					}
+					else
+					{
+						/* empty window specification, nothing to sort */
+						partNumCols = 0;
+						partColIdx = NULL;
+						partOperators = NULL;
+						ordNumCols = 0;
+						ordColIdx = NULL;
+						ordOperators = NULL;
+					}
 
-				/* ... and make the WindowAgg plan node */
-				result_plan = (Plan *)
-					make_windowagg(root,
-								   (List *) copyObject(window_tlist),
-								   wflists->windowFuncs[wc->winref],
-								   wc->winref,
-								   partNumCols,
-								   partColIdx,
-								   partOperators,
-								   ordNumCols,
-								   ordColIdx,
-								   ordOperators,
-								   wc->frameOptions,
-								   wc->startOffset,
-								   wc->endOffset,
-								   result_plan);
+					if (lnext(l))
+					{
+						/* Add the current WindowFuncs to the running tlist */
+						window_tlist = add_to_flat_tlist(window_tlist,
+											   wflists->windowFuncs[wc->winref]);
+					}
+					else
+					{
+						/* Install the original tlist in the topmost WindowAgg */
+						window_tlist = tlist;
+					}
+
+					/* ... and make the WindowAgg plan node */
+					result_plan = (Plan *)
+						make_windowagg(root,
+									   (List *) copyObject(window_tlist),
+									   wflists->windowFuncs[wc->winref],
+									   wc->winref,
+									   partNumCols,
+									   partColIdx,
+									   partOperators,
+									   ordNumCols,
+									   ordColIdx,
+									   ordOperators,
+									   wc->frameOptions,
+									   wc->startOffset,
+									   wc->endOffset,
+									   result_plan);
+				}
 			}
-		}
+
+			result_plan_list = lappend(result_plan_list, result_plan);
+		}						 /* foreach final_rel_list */
 	}							/* end of if (setOperations) */
 
-	/*
-	 * If there is a DISTINCT clause, add the necessary node(s).
-	 */
-	if (parse->distinctClause)
+	foreach(lc, result_plan_list)
 	{
-		double		dNumDistinctRows;
-		long		numDistinctRows;
+		result_plan = (Plan *) lfirst(lc);
 
 		/*
-		 * If there was grouping or aggregation, use the current number of
-		 * rows as the estimated number of DISTINCT rows (ie, assume the
-		 * result was already mostly unique).  If not, use the number of
-		 * distinct-groups calculated previously.
+		 * If there is a DISTINCT clause, add the necessary node(s).
 		 */
-		if (parse->groupClause || root->hasHavingQual || parse->hasAggs)
-			dNumDistinctRows = result_plan->plan_rows;
-		else
-			dNumDistinctRows = dNumGroups;
-
-		/* Also convert to long int --- but 'ware overflow! */
-		numDistinctRows = (long) Min(dNumDistinctRows, (double) LONG_MAX);
-
-		/* Choose implementation method if we didn't already */
-		if (!tested_hashed_distinct)
+		if (parse->distinctClause)
 		{
-			/*
-			 * At this point, either hashed or sorted grouping will have to
-			 * work from result_plan, so we pass that as both "cheapest" and
-			 * "sorted".
-			 */
-			use_hashed_distinct =
-				choose_hashed_distinct(root,
-									   tuple_fraction, limit_tuples,
-									   result_plan->plan_rows,
-									   result_plan->plan_width,
-									   result_plan->startup_cost,
-									   result_plan->total_cost,
-									   result_plan->startup_cost,
-									   result_plan->total_cost,
-									   current_pathkeys,
-									   dNumDistinctRows);
-		}
+			double		dNumDistinctRows;
+			long		numDistinctRows;
 
-		if (use_hashed_distinct)
-		{
-			/* Hashed aggregate plan --- no sort needed */
-			result_plan = (Plan *) make_agg(root,
-											result_plan->targetlist,
-											NIL,
-											AGG_HASHED,
-											NULL,
-										  list_length(parse->distinctClause),
-								 extract_grouping_cols(parse->distinctClause,
-													result_plan->targetlist),
-								 extract_grouping_ops(parse->distinctClause),
-											numDistinctRows,
-											result_plan);
-			/* Hashed aggregation produces randomly-ordered results */
-			current_pathkeys = NIL;
-		}
-		else
-		{
 			/*
-			 * Use a Unique node to implement DISTINCT.  Add an explicit sort
-			 * if we couldn't make the path come out the way the Unique node
-			 * needs it.  If we do have to sort, always sort by the more
-			 * rigorous of DISTINCT and ORDER BY, to avoid a second sort
-			 * below.  However, for regular DISTINCT, don't sort now if we
-			 * don't have to --- sorting afterwards will likely be cheaper,
-			 * and also has the possibility of optimizing via LIMIT.  But for
-			 * DISTINCT ON, we *must* force the final sort now, else it won't
-			 * have the desired behavior.
+			 * If there was grouping or aggregation, use the current number of
+			 * rows as the estimated number of DISTINCT rows (ie, assume the
+			 * result was already mostly unique).  If not, use the number of
+			 * distinct-groups calculated previously.
 			 */
-			List	   *needed_pathkeys;
-
-			if (parse->hasDistinctOn &&
-				list_length(root->distinct_pathkeys) <
-				list_length(root->sort_pathkeys))
-				needed_pathkeys = root->sort_pathkeys;
+			if (parse->groupClause || root->hasHavingQual || parse->hasAggs)
+				dNumDistinctRows = result_plan->plan_rows;
 			else
-				needed_pathkeys = root->distinct_pathkeys;
+				dNumDistinctRows = dNumGroups;
+
+			/* Also convert to long int --- but 'ware overflow! */
+			numDistinctRows = (long) Min(dNumDistinctRows, (double) LONG_MAX);
+
+			/* Choose implementation method if we didn't already */
+			if (!tested_hashed_distinct)
+			{
+				/*
+				 * At this point, either hashed or sorted grouping will have to
+				 * work from result_plan, so we pass that as both "cheapest" and
+				 * "sorted".
+				 */
+				use_hashed_distinct =
+					choose_hashed_distinct(root,
+										   tuple_fraction, limit_tuples,
+										   result_plan->plan_rows,
+										   result_plan->plan_width,
+										   result_plan->startup_cost,
+										   result_plan->total_cost,
+										   result_plan->startup_cost,
+										   result_plan->total_cost,
+										   current_pathkeys,
+										   dNumDistinctRows);
+			}
 
-			if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys))
+			if (use_hashed_distinct)
+			{
+				/* Hashed aggregate plan --- no sort needed */
+				result_plan = (Plan *) make_agg(root,
+												result_plan->targetlist,
+												NIL,
+												AGG_HASHED,
+												NULL,
+											  list_length(parse->distinctClause),
+									 extract_grouping_cols(parse->distinctClause,
+														result_plan->targetlist),
+									 extract_grouping_ops(parse->distinctClause),
+												numDistinctRows,
+												result_plan);
+				/* Hashed aggregation produces randomly-ordered results */
+				current_pathkeys = NIL;
+			}
+			else
 			{
-				if (list_length(root->distinct_pathkeys) >=
+				/*
+				 * Use a Unique node to implement DISTINCT.  Add an explicit sort
+				 * if we couldn't make the path come out the way the Unique node
+				 * needs it.  If we do have to sort, always sort by the more
+				 * rigorous of DISTINCT and ORDER BY, to avoid a second sort
+				 * below.  However, for regular DISTINCT, don't sort now if we
+				 * don't have to --- sorting afterwards will likely be cheaper,
+				 * and also has the possibility of optimizing via LIMIT.  But for
+				 * DISTINCT ON, we *must* force the final sort now, else it won't
+				 * have the desired behavior.
+				 */
+				List	   *needed_pathkeys;
+
+				if (parse->hasDistinctOn &&
+					list_length(root->distinct_pathkeys) <
 					list_length(root->sort_pathkeys))
-					current_pathkeys = root->distinct_pathkeys;
+					needed_pathkeys = root->sort_pathkeys;
 				else
+					needed_pathkeys = root->distinct_pathkeys;
+
+				if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys))
 				{
-					current_pathkeys = root->sort_pathkeys;
-					/* Assert checks that parser didn't mess up... */
-					Assert(pathkeys_contained_in(root->distinct_pathkeys,
-												 current_pathkeys));
+					if (list_length(root->distinct_pathkeys) >=
+						list_length(root->sort_pathkeys))
+						current_pathkeys = root->distinct_pathkeys;
+					else
+					{
+						current_pathkeys = root->sort_pathkeys;
+						/* Assert checks that parser didn't mess up... */
+						Assert(pathkeys_contained_in(root->distinct_pathkeys,
+													 current_pathkeys));
+					}
+
+					result_plan = (Plan *) make_sort_from_pathkeys(root,
+																   result_plan,
+																current_pathkeys,
+																   -1.0);
 				}
 
+				result_plan = (Plan *) make_unique(result_plan,
+												   parse->distinctClause);
+				result_plan->plan_rows = dNumDistinctRows;
+				/* The Unique node won't change sort ordering */
+			}
+		}
+
+		/*
+		 * If ORDER BY was given and we were not able to make the plan come out in
+		 * the right order, add an explicit sort step.
+		 */
+		if (parse->sortClause)
+		{
+			if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
+			{
 				result_plan = (Plan *) make_sort_from_pathkeys(root,
 															   result_plan,
-															current_pathkeys,
-															   -1.0);
+															 root->sort_pathkeys,
+															   limit_tuples);
+				current_pathkeys = root->sort_pathkeys;
 			}
-
-			result_plan = (Plan *) make_unique(result_plan,
-											   parse->distinctClause);
-			result_plan->plan_rows = dNumDistinctRows;
-			/* The Unique node won't change sort ordering */
 		}
-	}
 
-	/*
-	 * If ORDER BY was given and we were not able to make the plan come out in
-	 * the right order, add an explicit sort step.
-	 */
-	if (parse->sortClause)
-	{
-		if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
+		/*
+		 * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
+		 * (Note: we intentionally test parse->rowMarks not root->rowMarks here.
+		 * If there are only non-locking rowmarks, they should be handled by the
+		 * ModifyTable node instead.)
+		 */
+		if (parse->rowMarks)
 		{
-			result_plan = (Plan *) make_sort_from_pathkeys(root,
-														   result_plan,
-														 root->sort_pathkeys,
-														   limit_tuples);
-			current_pathkeys = root->sort_pathkeys;
-		}
-	}
+			result_plan = (Plan *) make_lockrows(result_plan,
+												 root->rowMarks,
+												 SS_assign_special_param(root));
 
-	/*
-	 * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
-	 * (Note: we intentionally test parse->rowMarks not root->rowMarks here.
-	 * If there are only non-locking rowmarks, they should be handled by the
-	 * ModifyTable node instead.)
-	 */
-	if (parse->rowMarks)
-	{
-		result_plan = (Plan *) make_lockrows(result_plan,
-											 root->rowMarks,
-											 SS_assign_special_param(root));
+			/*
+			 * The result can no longer be assumed sorted, since locking might
+			 * cause the sort key columns to be replaced with new values.
+			 */
+			current_pathkeys = NIL;
+		}
 
 		/*
-		 * The result can no longer be assumed sorted, since locking might
-		 * cause the sort key columns to be replaced with new values.
+		 * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node.
 		 */
-		current_pathkeys = NIL;
-	}
+		if (limit_needed(parse))
+		{
+			result_plan = (Plan *) make_limit(result_plan,
+											  parse->limitOffset,
+											  parse->limitCount,
+											  offset_est,
+											  count_est);
+		}
 
-	/*
-	 * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node.
-	 */
-	if (limit_needed(parse))
-	{
-		result_plan = (Plan *) make_limit(result_plan,
-										  parse->limitOffset,
-										  parse->limitCount,
-										  offset_est,
-										  count_est);
-	}
+		lfirst(lc) = result_plan;
+	} /* foreach all_plans */
 
 	/*
 	 * Return the actual output ordering in query_pathkeys for possible use by
@@ -1999,7 +2019,16 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	 */
 	root->query_pathkeys = current_pathkeys;
 
-	return result_plan;
+	/* if there is only one plan, then just return that plan */
+	if (list_length(result_plan_list) == 1)
+		return (Plan *) linitial(result_plan_list);
+
+	/*
+	 * Otherwise we'd better add an AlternativePlan node to allow the executor
+	 * to decide which plan to use.
+	 */
+	else
+		return (Plan *) make_alternativeplan(result_plan_list);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 4d3fbca..042f8b1 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -435,6 +435,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 	 */
 	switch (nodeTag(plan))
 	{
+		case T_AlternativePlan:
+			{
+				AlternativePlan *aplan = (AlternativePlan *) plan;
+				ListCell *lc;
+				foreach(lc, aplan->planList)
+				{
+					Plan *plan = (Plan *) lfirst(lc);
+					set_plan_refs(root, plan, rtoffset);
+				}
+			}
+			break;
 		case T_SeqScan:
 			{
 				SeqScan    *splan = (SeqScan *) plan;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index b2becfa..fea198e 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -25,7 +25,9 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_constraint.h"
 #include "catalog/heap.h"
+#include "catalog/pg_type.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -38,6 +40,7 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
@@ -89,6 +92,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *fkinfos = NIL;
+	Relation	fkeyRel;
+	Relation	fkeyRelIdx;
+	ScanKeyData fkeyScankey;
+	SysScanDesc fkeyScan;
+	HeapTuple	tuple;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -384,6 +393,111 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	heap_close(relation, NoLock);
 
+	/* load foreign key constraints */
+	ScanKeyInit(&fkeyScankey,
+				Anum_pg_constraint_conrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relationObjectId));
+
+	fkeyRel = heap_open(ConstraintRelationId, AccessShareLock);
+	fkeyRelIdx = index_open(ConstraintRelidIndexId, AccessShareLock);
+	fkeyScan = systable_beginscan_ordered(fkeyRel, fkeyRelIdx, NULL, 1, &fkeyScankey);
+
+	while ((tuple = systable_getnext_ordered(fkeyScan, ForwardScanDirection)) != NULL)
+	{
+		Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tuple);
+		ForeignKeyInfo *fkinfo;
+		Datum		adatum;
+		bool		isNull;
+		ArrayType  *arr;
+		int			nelements;
+
+		/* skip if not a foreign key */
+		if (con->contype != CONSTRAINT_FOREIGN)
+			continue;
+
+		/* we're not interested unless the fkey has been validated */
+		if (!con->convalidated)
+			continue;
+
+		fkinfo = (ForeignKeyInfo *) palloc(sizeof(ForeignKeyInfo));
+		fkinfo->conindid = con->conindid;
+		fkinfo->confrelid = con->confrelid;
+		fkinfo->convalidated = con->convalidated;
+		fkinfo->conrelid = con->conrelid;
+		fkinfo->confupdtype = con->confupdtype;
+		fkinfo->confdeltype = con->confdeltype;
+		fkinfo->confmatchtype = con->confmatchtype;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "conkey is not a 1-D smallint array");
+
+		fkinfo->conkey = (int16 *) ARR_DATA_PTR(arr);
+		fkinfo->conncols = nelements;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_confkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null confkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "confkey is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of confkey elements does not equal conkey elements");
+
+		fkinfo->confkey = (int16 *) ARR_DATA_PTR(arr);
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conpfeqop,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conpfeqop for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != OIDOID)
+			elog(ERROR, "conpfeqop is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of conpfeqop elements does not equal conkey elements");
+
+		fkinfo->conpfeqop = (Oid *) ARR_DATA_PTR(arr);
+
+		fkinfos = lappend(fkinfos, fkinfo);
+	}
+
+	rel->fklist = fkinfos;
+	systable_endscan_ordered(fkeyScan);
+	index_close(fkeyRelIdx, AccessShareLock);
+	heap_close(fkeyRel, AccessShareLock);
+
 	/*
 	 * Allow a plugin to editorialize on the info we obtained from the
 	 * catalogs.  Actions might include altering the assumed relation size,
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 4c76f54..349f330 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -115,6 +115,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->lateral_relids = NULL;
 	rel->lateral_referencers = NULL;
 	rel->indexlist = NIL;
+	rel->fklist = NIL;
 	rel->pages = 0;
 	rel->tuples = 0;
 	rel->allvisfrac = 0;
@@ -127,6 +128,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->baserestrictcost.startup = 0;
 	rel->baserestrictcost.per_tuple = 0;
 	rel->joininfo = NIL;
+	rel->skipFlags = PLAN_SUITABILITY_ALL_PURPOSE;
 	rel->has_eclass_joins = false;
 
 	/* Check type of rtable entry */
@@ -377,6 +379,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->lateral_relids = NULL;
 	joinrel->lateral_referencers = NULL;
 	joinrel->indexlist = NIL;
+	joinrel->fklist = NIL;
 	joinrel->pages = 0;
 	joinrel->tuples = 0;
 	joinrel->allvisfrac = 0;
@@ -389,6 +392,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->baserestrictcost.startup = 0;
 	joinrel->baserestrictcost.per_tuple = 0;
 	joinrel->joininfo = NIL;
+	joinrel->skipFlags = PLAN_SUITABILITY_ALL_PURPOSE;
 	joinrel->has_eclass_joins = false;
 
 	/*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 73138e0..db0f90a 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -916,6 +916,33 @@ get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 	ReleaseSysCache(tp);
 }
 
+/*
+ * get_attnotnull
+ *
+ *		Given the relation id and the attribute number,
+ *		return the "attnotnull" field from the attribute relation.
+ */
+bool
+get_attnotnull(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache2(ATTNUM,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+		bool		result;
+
+		result = att_tup->attnotnull;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return false;
+}
+
 /*				---------- COLLATION CACHE ----------					 */
 
 /*
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index d0b0356..34a75e4 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -181,6 +181,7 @@ extern void ExecBSTruncateTriggers(EState *estate,
 extern void ExecASTruncateTriggers(EState *estate,
 					   ResultRelInfo *relinfo);
 
+extern bool AfterTriggerQueueIsEmpty(void);
 extern void AfterTriggerBeginXact(void);
 extern void AfterTriggerBeginQuery(void);
 extern void AfterTriggerEndQuery(EState *estate);
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index bc71fea..0e45f8e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -77,6 +77,7 @@ typedef enum NodeTag
 	T_SetOp,
 	T_LockRows,
 	T_Limit,
+	T_AlternativePlan,
 	/* these aren't subclasses of Plan: */
 	T_NestLoopParam,
 	T_PlanRowMark,
@@ -123,6 +124,7 @@ typedef enum NodeTag
 	T_SetOpState,
 	T_LockRowsState,
 	T_LimitState,
+	T_AlternativePlanState,
 
 	/*
 	 * TAGS FOR PRIMITIVE NODES (primnodes.h)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 5eaa435..f1438db 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -734,6 +734,10 @@ typedef enum RTEKind
 	RTE_CTE						/* common table expr (WITH list element) */
 } RTEKind;
 
+/* Bit flags to mark suitability of plans */
+#define PLAN_SUITABILITY_ALL_PURPOSE		0
+#define PLAN_SUITABILITY_FK_TRIGGER_EMPTY	1
+
 typedef struct RangeTblEntry
 {
 	NodeTag		type;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..9169d36 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,8 +70,11 @@ typedef struct PlannedStmt
 
 	int			nParamExec;		/* number of PARAM_EXEC Params used */
 
+	int			suitableFor; /* under which conditions can this plan be used */
+
 	bool		hasRowSecurity;	/* row security applied? */
 
+
 } PlannedStmt;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -767,6 +770,20 @@ typedef struct LockRows
 	int			epqParam;		/* ID of Param for EvalPlanQual re-eval */
 } LockRows;
 
+
+/* ----------------
+ *		alternative plan node
+ *
+ * Stores a list of alternative plans and one
+ * all purpose plan.
+ * ----------------
+ */
+typedef struct AlternativePlan
+{
+	Plan		plan;
+	List	   *planList;
+} AlternativePlan;
+
 /* ----------------
  *		limit node
  *
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 7116496..6cfaecd 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -95,6 +95,8 @@ typedef struct PlannerGlobal
 
 	int			nParamExec;		/* number of PARAM_EXEC Params used */
 
+	int			suitableFor; /* under which conditions can this plan be used */
+
 	Index		lastPHId;		/* highest PlaceHolderVar ID assigned */
 
 	Index		lastRowMarkId;	/* highest PlanRowMark ID assigned */
@@ -103,6 +105,7 @@ typedef struct PlannerGlobal
 
 	bool		hasRowSecurity;	/* row security applied? */
 
+
 } PlannerGlobal;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -359,6 +362,8 @@ typedef struct PlannerInfo
  *		lateral_referencers - relids of rels that reference this one laterally
  *		indexlist - list of IndexOptInfo nodes for relation's indexes
  *					(always NIL if it's not a table)
+ *		fklist - list of ForeignKeyInfo's for relation's foreign key
+ *					constraints. (always NIL if it's not a table)
  *		pages - number of disk pages in relation (zero if not a table)
  *		tuples - number of tuples in relation (not considering restrictions)
  *		allvisfrac - fraction of disk pages that are marked all-visible
@@ -452,6 +457,7 @@ typedef struct RelOptInfo
 	Relids		lateral_relids; /* minimum parameterization of rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *fklist;			/* list of ForeignKeyInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -469,6 +475,8 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;		/* cost of evaluating the above */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
+	int			skipFlags;		/* it may be possible to not bother joining
+								 * this relation at all */
 	bool		has_eclass_joins;		/* T means joininfo is incomplete */
 } RelOptInfo;
 
@@ -542,6 +550,51 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * ForeignKeyInfo
+ *		Used to store pg_constraint records for foreign key constraints for use
+ *		by the planner.
+ *
+ *		conindid - The index which supports the foreign key
+ *
+ *		confrelid - The relation that is referenced by this foreign key
+ *
+ *		convalidated - True if the foreign key has been validated.
+ *
+ *		conrelid - The Oid of the relation that the foreign key belongs to
+ *
+ *		confupdtype - ON UPDATE action for when the referenced table is updated
+ *
+ *		confdeltype - ON DELETE action, controls what to do when a record is
+ *					deleted from the referenced table.
+ *
+ *		confmatchtype - foreign key match type, e.g MATCH FULL, MATCH PARTIAL
+ *
+ *		conncols - Number of columns defined in the foreign key
+ *
+ *		conkey - An array of conncols elements to store the varattno of the
+ *					columns on the referencing side of the foreign key
+ *
+ *		confkey - An array of conncols elements to store the varattno of the
+ *					columns on the referenced side of the foreign key
+ *
+ *		conpfeqop - An array of conncols elements to store the operators for
+ *					PK = FK comparisons
+ */
+typedef struct ForeignKeyInfo
+{
+	Oid			conindid;		/* index supporting this constraint */
+	Oid			confrelid;		/* relation referenced by foreign key */
+	bool		convalidated;	/* constraint has been validated? */
+	Oid			conrelid;		/* relation this constraint constrains */
+	char		confupdtype;	/* foreign key's ON UPDATE action */
+	char		confdeltype;	/* foreign key's ON DELETE action */
+	char		confmatchtype;	/* foreign key's match type */
+	int			conncols;		/* number of columns references */
+	int16	   *conkey;			/* Columns of conrelid that the constraint applies to */
+	int16	   *confkey;		/* columns of confrelid that foreign key references */
+	Oid		   *conpfeqop;		/* Operator list for comparing PK to FK */
+} ForeignKeyInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index afa5f9b..ff9f4cd 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -37,7 +37,8 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
 
-extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
+extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist,
+								int skipflags);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
@@ -119,6 +120,8 @@ extern List *generate_join_implied_equalities(PlannerInfo *root,
 								 Relids join_relids,
 								 Relids outer_relids,
 								 RelOptInfo *inner_rel);
+extern Oid select_equality_operator(EquivalenceClass *ec, Oid lefttype,
+								 Oid righttype);
 extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
 extern void add_child_rel_equivalences(PlannerInfo *root,
 						   AppendRelInfo *appinfo,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 3fdc2cb..0d2ebb9 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -27,8 +27,9 @@ typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
 /*
  * prototypes for plan/planmain.c
  */
-extern RelOptInfo *query_planner(PlannerInfo *root, List *tlist,
-			  query_pathkeys_callback qp_callback, void *qp_extra);
+extern List *query_planner(PlannerInfo *root, List *tlist,
+			  query_pathkeys_callback qp_callback, void *qp_extra,
+			  bool alternativePlans);
 
 /*
  * prototypes for plan/planagg.c
@@ -73,6 +74,7 @@ extern Group *make_group(PlannerInfo *root, List *tlist, List *qual,
 extern Plan *materialize_finished_plan(Plan *subplan);
 extern Unique *make_unique(Plan *lefttree, List *distinctList);
 extern LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int epqParam);
+extern AlternativePlan *make_alternativeplan(List *planlist);
 extern Limit *make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount,
 		   int64 offset_est, int64 count_est);
 extern SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan *lefttree,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 1a556f8..1cb9970 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -68,6 +68,7 @@ extern Oid	get_atttype(Oid relid, AttrNumber attnum);
 extern int32 get_atttypmod(Oid relid, AttrNumber attnum);
 extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 					  Oid *typid, int32 *typmod, Oid *collid);
+extern bool get_attnotnull(Oid relid, AttrNumber attnum);
 extern char *get_collation_name(Oid colloid);
 extern char *get_constraint_name(Oid conoid);
 extern Oid	get_opclass_family(Oid opclass);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2501184..e485554 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3276,6 +3276,171 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 (1 row)
 
 rollback;
+begin work;
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+ id | b_id 
+----+------
+  3 |    2
+(1 row)
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (val = 10)
+(6 rows)
+
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (id = val)
+(6 rows)
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+         QUERY PLAN         
+----------------------------
+ Hash Join
+   Hash Cond: (a.id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+          QUERY PLAN          
+------------------------------
+ Hash Right Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+        QUERY PLAN         
+---------------------------
+ Nested Loop
+   ->  Seq Scan on c
+   ->  Materialize
+         ->  Seq Scan on a
+(4 rows)
+
+set constraints b_c_id_fkey deferred;
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+explain (costs off)
+execute ab;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+explain (costs off)
+execute ab;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+rollback;
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 718e1d9..c3ee72e 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -977,6 +977,89 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 
 rollback;
 
+begin work;
+
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+
+set constraints b_c_id_fkey deferred;
+
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+rollback;
+
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);

#44

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: David Rowley (#43)

1 attachment(s)

Re: Removing INNER JOINs

On 10 December 2014 at 23:04, David Rowley <dgrowleyml@gmail.com> wrote:

The bulk of my changes are in allpaths.c, planmain.c and planner.c. The
critical change is query_planner() now returns a List instead of
a RelOptInfo. I wasn't quite sure how else to handle this. Please also
notice the change to make_one_rel(). This function is now called twice if
remove_useless_joins() found 1 or more INNER JOINs to be possibly
removable. in remove_useless_joins() the rels are not marked as
RELOPT_DEADREL like they are with LEFT JOINs, they remain as
RELOPT_BASEREL, only they have the skipFlags to mark that they can be
removed when there's no FK triggers pending. A flag on PlannerGlobal is
also set which will later force make_one_rel() to be called twice. Once for
the join removal plan, and once for the "All Purpose" plan. query_planner()
then returns a list of the RelOptInfos of those 2 final rels created by
make_one_rel(). All the processing that previously got done on that final
rel now gets done on the list of final rels. If there's more than 1 in that
list then I'm making the root node of the plan an "AlternativePlan" node.
On init of this node during execution time there is some logic which
chooses which plan to execute. The code here just calls ExecInitNode() on
the root node of the selected plan and returns that, thus skipping over the
AlternativePlan node, so that it can't be seen by EXPLAIN or EXPLAIN
ANALYZE.

I had left the previous patch a bit unpolished.
In the attached I've created a new source file for nodeAlternativePlan and
also performed various cosmetic cleanups.

I'd be keen to know what people's thoughts are about the
nodeAlternativePlan only surviving until the plan is initialised.

Regards

David Rowley

Attachments:

inner_join_removals_2015-01-09_1dd478e.patchapplication/octet-stream; name=inner_join_removals_2015-01-09_1dd478e.patchDownload

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 4899a27..43bc425 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3889,6 +3889,17 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 	return all_fired;
 }
 
+/* ----------
+ * AfterTriggerQueueIsEmpty()
+ *
+ *	True if there are no pending triggers in the queue.
+ * ----------
+ */
+bool
+AfterTriggerQueueIsEmpty(void)
+{
+	return (afterTriggers.query_depth == -1 && afterTriggers.events.head == NULL);
+}
 
 /* ----------
  * AfterTriggerBeginXact()
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..bfbd5b3 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -14,8 +14,8 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
-       execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       execUtils.o functions.o instrument.o nodeAlternativePlan.o nodeAppend.o \
+       nodeAgg.o nodeBitmapAnd.o nodeBitmapOr.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeCustom.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..523e187 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
+#include "executor/nodeAlternativePlan.h"
 #include "executor/nodeAppend.h"
 #include "executor/nodeBitmapAnd.h"
 #include "executor/nodeBitmapHeapscan.h"
@@ -147,6 +148,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 			/*
 			 * control nodes
 			 */
+		case T_AlternativePlan:
+			result = (PlanState *) ExecInitAlternativePlan((AlternativePlan *)node,
+												  estate, eflags);
+			break;
+
 		case T_Result:
 			result = (PlanState *) ExecInitResult((Result *) node,
 												  estate, eflags);
diff --git a/src/backend/executor/nodeAlternativePlan.c b/src/backend/executor/nodeAlternativePlan.c
new file mode 100644
index 0000000..cafe33a
--- /dev/null
+++ b/src/backend/executor/nodeAlternativePlan.c
@@ -0,0 +1,51 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeAlternativePlan.c
+ *	  Node to support storage of alternative plans.
+ *
+ *	  Note that this node is rather special as it only exists while the plan
+ *	  is being initialised.
+ *
+ *	  When the initialization method is called for this node, a decision is
+ *	  made to decide which plan should be initialized, the code here then calls
+ *	  the initialize method on the selected plan and returns the state value
+ *	  from the root node of that plan.
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeAlternativePlan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/trigger.h"
+
+#include "executor/executor.h"
+#include "executor/nodeAlternativePlan.h"
+
+PlanState *
+ExecInitAlternativePlan(AlternativePlan *node, EState *estate, int eflags)
+{
+	/*
+	 * If we have items in the fk trigger queue, then we'd better use the all
+	 * all purpose plan. Since an AlternativePlan node has no state, we simply
+	 * just initialize the root node of the selected plan. This means that the
+	 * AlternativePlan node is *never* seen in EXPLAIN or EXPLAIN ANALYZE.
+	 */
+	if (!AfterTriggerQueueIsEmpty())
+		return (PlanState *) ExecInitNode((Plan *) list_nth(node->planList, 1),
+											estate, eflags);
+
+	/*
+	 * Otherwise we initialize the root node of the optimized plan and return
+	 * that.
+	 */
+	else
+		return (PlanState *) ExecInitNode((Plan *) linitial(node->planList),
+											estate, eflags);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index f1a24f5..3cd1a4e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -93,6 +93,7 @@ _copyPlannedStmt(const PlannedStmt *from)
 	COPY_NODE_FIELD(relationOids);
 	COPY_NODE_FIELD(invalItems);
 	COPY_SCALAR_FIELD(nParamExec);
+	COPY_SCALAR_FIELD(suitableFor);
 
 	return newnode;
 }
@@ -963,6 +964,16 @@ _copyLimit(const Limit *from)
 	return newnode;
 }
 
+static AlternativePlan *
+_copyAlternativePlan(const AlternativePlan *from)
+{
+	AlternativePlan *newnode = makeNode(AlternativePlan);
+
+	COPY_NODE_FIELD(planList);
+
+	return newnode;
+}
+
 /*
  * _copyNestLoopParam
  */
@@ -4117,6 +4128,9 @@ copyObject(const void *from)
 		case T_Limit:
 			retval = _copyLimit(from);
 			break;
+		case T_AlternativePlan:
+			retval = _copyAlternativePlan(from);
+			break;
 		case T_NestLoopParam:
 			retval = _copyNestLoopParam(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index dd1278b..9824b3d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -255,6 +255,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_FIELD(relationOids);
 	WRITE_NODE_FIELD(invalItems);
 	WRITE_INT_FIELD(nParamExec);
+	WRITE_INT_FIELD(suitableFor);
 }
 
 /*
@@ -1716,6 +1717,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
 	WRITE_UINT_FIELD(lastPHId);
 	WRITE_UINT_FIELD(lastRowMarkId);
 	WRITE_BOOL_FIELD(transientPlan);
+	WRITE_INT_FIELD(suitableFor);
 }
 
 static void
@@ -1801,6 +1803,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	/* we don't try to print fdwroutine or fdw_private */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_NODE_FIELD(joininfo);
+	WRITE_INT_FIELD(removal_flags);
 	WRITE_BOOL_FIELD(has_eclass_joins);
 }
 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..69990a2 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,7 +97,8 @@ static void set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				 RangeTblEntry *rte);
 static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					   RangeTblEntry *rte);
-static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
+static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist,
+					   int removal_flags);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
 						  pushdown_safety_info *safetyInfo);
 static bool recurse_pushdown_safe(Node *setOp, Query *topquery,
@@ -122,7 +123,7 @@ static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
  *	  single rel that represents the join of all base rels in the query.
  */
 RelOptInfo *
-make_one_rel(PlannerInfo *root, List *joinlist)
+make_one_rel(PlannerInfo *root, List *joinlist, int removal_flags)
 {
 	RelOptInfo *rel;
 	Index		rti;
@@ -142,7 +143,8 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 		Assert(brel->relid == rti);		/* sanity check on array */
 
 		/* ignore RTEs that are "other rels" */
-		if (brel->reloptkind != RELOPT_BASEREL)
+		if (brel->reloptkind != RELOPT_BASEREL ||
+			brel->removal_flags & removal_flags)
 			continue;
 
 		root->all_baserels = bms_add_member(root->all_baserels, brel->relid);
@@ -157,12 +159,13 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	/*
 	 * Generate access paths for the entire join tree.
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	rel = make_rel_from_joinlist(root, joinlist, removal_flags);
+
 
 	/*
 	 * The result should join all and only the query's base rels.
 	 */
-	Assert(bms_equal(rel->relids, root->all_baserels));
+	Assert(bms_is_subset(root->all_baserels, rel->relids));
 
 	return rel;
 }
@@ -1496,7 +1499,7 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * data structure.
  */
 static RelOptInfo *
-make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
+make_rel_from_joinlist(PlannerInfo *root, List *joinlist, int removal_flags)
 {
 	int			levels_needed;
 	List	   *initial_rels;
@@ -1528,11 +1531,23 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 			int			varno = ((RangeTblRef *) jlnode)->rtindex;
 
 			thisrel = find_base_rel(root, varno);
+
+			/*
+			 * If this relation can be removed for these removal_flags, then
+			 * we'll not bother including this in the list of relations to join
+			 * to
+			 */
+			if ((thisrel->removal_flags & removal_flags))
+			{
+				/* one less level needed too */
+				levels_needed--;
+				continue;
+			}
 		}
 		else if (IsA(jlnode, List))
 		{
 			/* Recurse to handle subproblem */
-			thisrel = make_rel_from_joinlist(root, (List *) jlnode);
+			thisrel = make_rel_from_joinlist(root, (List *) jlnode, removal_flags);
 		}
 		else
 		{
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index eb65c97..8ddc9db 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -49,8 +49,6 @@ static List *generate_join_implied_equalities_broken(PlannerInfo *root,
 										Relids outer_relids,
 										Relids nominal_inner_relids,
 										RelOptInfo *inner_rel);
-static Oid select_equality_operator(EquivalenceClass *ec,
-						 Oid lefttype, Oid righttype);
 static RestrictInfo *create_join_clause(PlannerInfo *root,
 				   EquivalenceClass *ec, Oid opno,
 				   EquivalenceMember *leftem,
@@ -1282,7 +1280,7 @@ generate_join_implied_equalities_broken(PlannerInfo *root,
  *
  * Returns InvalidOid if no operator can be found for this datatype combination
  */
-static Oid
+Oid
 select_equality_operator(EquivalenceClass *ec, Oid lefttype, Oid righttype)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 11d3933..e6bfe37 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -32,13 +32,21 @@
 #include "utils/lsyscache.h"
 
 /* local functions */
-static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					  RangeTblRef *removalrtr, Relids ignoredrels);
+static bool leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool relation_is_needed(PlannerInfo *root, Relids joinrelids,
+					  RelOptInfo *rel, Relids ignoredrels);
+static bool relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+					  RelOptInfo *referencedrel, List *referencing_vars,
+					  List *index_vars, List *operator_list);
+static bool expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					  List *indexvars, List *operators);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 					  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
 
-
 /*
  * remove_useless_joins
  *		Check for relations that don't actually need to be joined at all,
@@ -46,26 +54,104 @@ static Oid	distinct_col_search(int colno, List *colnos, List *opids);
  *
  * We are passed the current joinlist and return the updated list.  Other
  * data structures that have to be updated are accessible via "root".
+ *
+ * There are 2 methods here for removing joins. Joins such as LEFT JOINs
+ * which can be proved to be needless due to lack of use of any of the joining
+ * relation's columns and the existence of a unique index on a subset of the
+ * join clause, can simply be removed from the query plan at plan time. For
+ * certain other join types we make use of foreign keys to attempt to prove the
+ * join is needless, though, for these we're unable to be certain that the join
+ * is not required at plan time, as if the plan is executed when pending
+ * foreign key triggers have not yet been fired, then the foreign key is
+ * effectively violated until these triggers have fired. Removing a join in
+ * such a case could cause a query to produce incorrect results.
+ *
+ * Instead we handle this case by marking the RangeTblEntry for the relation
+ * with a special flag which tells the executor that it's possible that joining
+ * to this relation may not be required. The executor may then check this flag
+ * and choose to skip the join based on if there are foreign key triggers
+ * pending or not.
  */
 List *
 remove_useless_joins(PlannerInfo *root, List *joinlist)
 {
 	ListCell   *lc;
+	Relids		removedrels = NULL;
 
 	/*
-	 * We are only interested in relations that are left-joined to, so we can
-	 * scan the join_info_list to find them easily.
+	 * Start by analyzing INNER JOINed relations in order to determine if any
+	 * of the relations can be ignored.
 	 */
 restart:
+	foreach(lc, joinlist)
+	{
+		RangeTblRef		*rtr = (RangeTblRef *) lfirst(lc);
+		RelOptInfo		*rel;
+
+		if (!IsA(rtr, RangeTblRef))
+			continue;
+
+		rel = root->simple_rel_array[rtr->rtindex];
+
+		/* Don't try to remove this one again if we've already removed it */
+		if ((rel->removal_flags & PLAN_SUITABILITY_FK_TRIGGER_EMPTY) != 0)
+			continue;
+
+		/* skip if the join can't be removed */
+		if (!innerjoin_is_removable(root, joinlist, rtr, removedrels))
+			continue;
+
+		/*
+		 * Since we're not actually removing the join here, we need to maintain
+		 * a list of relations that we've "removed" so when we're checking if
+		 * other relations can be removed we'll know that if the to be removed
+		 * relation is only referenced by a relation that we've already removed
+		 * that it can be safely assumed that the relation is not referenced by
+		 * any useful relation.
+		 */
+		removedrels = bms_add_member(removedrels, rtr->rtindex);
+
+		/*
+		 * Mark that this relation is only required when the foreign key trigger
+		 * queue us non-empty.
+		 */
+		rel->removal_flags |= PLAN_SUITABILITY_FK_TRIGGER_EMPTY;
+
+		/*
+		 * Globally mark this plan to say that there are some relations which
+		 * are only required when the foreign key trigger queue is non-empty.
+		 * The planner will later generate 2 plans, 1 which is suitable only
+		 * when all if these bitmask conditions are met, and another which is
+		 * an all purpose plan, which will be used if *any* of the bitmask's
+		 * conditions are not met.
+		 */
+		root->glob->suitableFor |= PLAN_SUITABILITY_FK_TRIGGER_EMPTY;
+
+		/*
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that since we've added
+		 * this relation to the removedrels, we may now realize that other
+		 * relations can also be removed as they're only referenced by the one
+		 * that we've just marked as possibly removable).
+		 */
+		goto restart;
+	}
+
+	/* now process special joins. Currently only left joins are supported */
 	foreach(lc, root->join_info_list)
 	{
 		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
 		int			innerrelid;
 		int			nremoved;
 
-		/* Skip if not removable */
-		if (!join_is_removable(root, sjinfo))
-			continue;
+		if (sjinfo->jointype == JOIN_LEFT)
+		{
+			/* Skip if not removable */
+			if (!leftjoin_is_removable(root, sjinfo))
+				continue;
+		}
+		else
+			continue; /* we don't support this join type */
 
 		/*
 		 * Currently, join_is_removable can only succeed when the sjinfo's
@@ -91,12 +177,11 @@ restart:
 		root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
 
 		/*
-		 * Restart the scan.  This is necessary to ensure we find all
-		 * removable joins independently of ordering of the join_info_list
-		 * (note that removal of attr_needed bits may make a join appear
-		 * removable that did not before).  Also, since we just deleted the
-		 * current list cell, we'd have to have some kluge to continue the
-		 * list scan anyway.
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that removal of
+		 * attr_needed bits may make a join, inner or outer, appear removable
+		 * that did not before).   Also, since we just deleted the current list
+		 * cell, we'd have to have some kluge to continue the list scan anyway.
 		 */
 		goto restart;
 	}
@@ -136,8 +221,226 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
 }
 
 /*
- * join_is_removable
- *	  Check whether we need not perform this special join at all, because
+ * innerjoin_is_removable
+ *		True if the join to removalrtr can be removed.
+ *
+ * In order to prove a relation which is inner joined is not required we must
+ * be sure that the join would emit exactly 1 row on the join condition. This
+ * differs from the logic which is used for proving LEFT JOINs can be removed,
+ * where it's possible to just check that a unique index exists on the relation
+ * being removed which has a set of columns that is a subset of the columns
+ * seen in the join condition. If no matching row is found then left join would
+ * not remove the non-matched row from the result set. This is not the case
+ * with INNER JOINs, so here we must use foreign keys as proof that the 1 row
+ * exists before we can allow any joins to be removed.
+ */
+static bool
+innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					   RangeTblRef *removalrtr, Relids ignoredrels)
+{
+	ListCell   *lc;
+	RelOptInfo *removalrel;
+
+	removalrel = find_base_rel(root, removalrtr->rtindex);
+
+	/*
+	 * As foreign keys may only reference base rels which have unique indexes,
+	 * we needn't go any further if we're not dealing with a base rel, or if
+	 * the base rel has no unique indexes. We'd also better abort if the
+	 * rtekind is anything but a relation, as things like sub-queries may have
+	 * grouping or distinct clauses that would cause us not to be able to use
+	 * the foreign key to prove the existence of a row matching the join
+	 * condition. We also abort if the rel has no eclass joins as such a rel
+	 * could well be joined using some operator which is not an equality
+	 * operator, or the rel may not even be inner joined at all.
+	 *
+	 * Here we actually only check if the rel has any indexes, ideally we'd be
+	 * checking for unique indexes, but we could only determine that by looping
+	 * over the indexlist, and this is likely too expensive a check to be worth
+	 * it here.
+	 */
+	if (removalrel->reloptkind != RELOPT_BASEREL ||
+		removalrel->rtekind != RTE_RELATION ||
+		removalrel->has_eclass_joins == false ||
+		removalrel->indexlist == NIL)
+		return false;
+
+	/*
+	 * Currently we disallow the removal if we find any baserestrictinfo items
+	 * on the relation being removed. The reason for this is that these would
+	 * filter out rows and make it so the foreign key cannot prove that we'll
+	 * match exactly 1 row on the join condition. However, this check is
+	 * currently probably a bit overly strict as it should be possible to just
+	 * check and ensure that each Var seen in the baserestrictinfo is also
+	 * present in an eclass and if so, just translate and move the whole
+	 * baserestrictinfo over to the relation which has the foreign key to prove
+	 * that this join is not needed. e.g:
+	 * SELECT a.* FROM a INNER JOIN b ON a.b_id = b.id WHERE b.id = 1;
+	 * could become: SELECT a.* FROM a WHERE a.b_id = 1;
+	 */
+	if (removalrel->baserestrictinfo != NIL)
+		return false;
+
+	/*
+	 * Currently only eclass joins are supported, so if there are any non
+	 * eclass join quals then we'll report the join is non-removable.
+	 */
+	if (removalrel->joininfo != NIL)
+		return false;
+
+	/*
+	 * Now we'll search through each relation in the joinlist to see if we can
+	 * find a relation which has a foreign key which references removalrel on
+	 * the join condition. If we find a rel with a foreign key which matches
+	 * the join condition exactly, then we can be sure that exactly 1 row will
+	 * be matched on the join, if we also see that no Vars from the relation
+	 * are needed, then we can report the join as removable.
+	 */
+	foreach (lc, joinlist)
+	{
+		RangeTblRef	*rtr = (RangeTblRef *) lfirst(lc);
+		RelOptInfo	*rel;
+		ListCell	*lc2;
+		List		*referencing_vars;
+		List		*index_vars;
+		List		*operator_list;
+		Relids		 joinrelids;
+
+		/* we can't remove ourself, or anything other than RangeTblRefs */
+		if (rtr == removalrtr || !IsA(rtr, RangeTblRef))
+			continue;
+
+		rel = find_base_rel(root, rtr->rtindex);
+
+		/*
+		 * The only relation type that can help us is a base rel with at least
+		 * one foreign key defined, if there's no eclass joins then this rel
+		 * is not going to help us prove the removalrel is not needed.
+		 */
+		if (rel->reloptkind != RELOPT_BASEREL ||
+			rel->rtekind != RTE_RELATION ||
+			rel->has_eclass_joins == false ||
+			rel->fklist == NIL)
+			continue;
+
+		/*
+		 * Both rels have eclass joins, but do they have eclass joins to each
+		 * other? Skip this rel if it does not.
+		 */
+		if (!have_relevant_eclass_joinclause(root, rel, removalrel))
+			continue;
+
+		joinrelids = bms_union(rel->relids, removalrel->relids);
+
+		/* if any of the Vars from the relation are needed then abort */
+		if (relation_is_needed(root, joinrelids, removalrel, ignoredrels))
+			return false;
+
+		referencing_vars = NIL;
+		index_vars = NIL;
+		operator_list = NIL;
+
+		/* now populate the lists with the join condition Vars */
+		foreach(lc2, root->eq_classes)
+		{
+			EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc2);
+
+			if (list_length(ec->ec_members) <= 1)
+				continue;
+
+			if (bms_overlap(removalrel->relids, ec->ec_relids) &&
+				bms_overlap(rel->relids, ec->ec_relids))
+			{
+				ListCell *lc3;
+				Var *refvar = NULL;
+				Var *idxvar = NULL;
+
+				/*
+				 * Look at each member of the eclass and try to find a Var from
+				 * each side of the join that we can append to the list of
+				 * columns that should be checked against each foreign key.
+				 *
+				 * The following logic does not allow for join removals to take
+				 * place for foreign keys that have duplicate columns on the
+				 * referencing side of the foreign key, such as:
+				 * (a,a) references (x,y)
+				 * The use case for such a foreign key is likely small enough
+				 * that we needn't bother making this code anymore complex to
+				 * solve. If we find more than 1 Var from any of the rels then
+				 * we'll bail out.
+				 */
+				foreach (lc3, ec->ec_members)
+				{
+					EquivalenceMember *ecm = (EquivalenceMember *) lfirst(lc3);
+
+					Var *var = (Var *) ecm->em_expr;
+
+					if (!IsA(var, Var))
+						continue; /* Ignore Consts */
+
+					if (var->varno == rel->relid)
+					{
+						if (refvar != NULL)
+							return false;
+						refvar = var;
+					}
+
+					else if (var->varno == removalrel->relid)
+					{
+						if (idxvar != NULL)
+							return false;
+						idxvar = var;
+					}
+				}
+
+				if (refvar != NULL && idxvar != NULL)
+				{
+					Oid opno;
+					Oid reloid = root->simple_rte_array[refvar->varno]->relid;
+
+					/*
+					 * We cannot allow the removal to take place if any of the
+					 * columns in the join condition are nullable. This is due
+					 * to the fact that the join condition would end up
+					 * filtering out NULL values for us, but if we remove the
+					 * join, then there's nothing to stop the NULLs getting
+					 * into the resultset.
+					 */
+					if (!get_attnotnull(reloid, refvar->varattno))
+						return false;
+
+					/* grab the correct equality operator for these two vars */
+					opno = select_equality_operator(ec, refvar->vartype, idxvar->vartype);
+
+					if (!OidIsValid(opno))
+						return false;
+
+					referencing_vars = lappend(referencing_vars, refvar);
+					index_vars = lappend(index_vars, idxvar);
+					operator_list = lappend_oid(operator_list, opno);
+				}
+			}
+		}
+
+		/*
+		 * Did we find any conditions? It's ok that we just check 1 of the 3
+		 * lists to see if it's empty here as these will always contain the
+		 * same number of items
+		 */
+		if (referencing_vars != NIL)
+		{
+			if (relation_has_foreign_key_for(root, rel, removalrel,
+				referencing_vars, index_vars, operator_list))
+				return true; /* removalrel can be removed */
+		}
+	}
+
+	return false; /* can't remove join */
+}
+
+/*
+ * leftjoin_is_removable
+ *	  Check whether we need not perform this left join at all, because
  *	  it will just duplicate its left input.
  *
  * This is true for a left join for which the join condition cannot match
@@ -147,7 +450,7 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
  * above the join.
  */
 static bool
-join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 {
 	int			innerrelid;
 	RelOptInfo *innerrel;
@@ -155,14 +458,14 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	Relids		joinrelids;
 	List	   *clause_list = NIL;
 	ListCell   *l;
-	int			attroff;
+
+	Assert(sjinfo->jointype == JOIN_LEFT);
 
 	/*
-	 * Must be a non-delaying left join to a single baserel, else we aren't
+	 * Must be a non-delaying join to a single baserel, else we aren't
 	 * going to be able to do anything with it.
 	 */
-	if (sjinfo->jointype != JOIN_LEFT ||
-		sjinfo->delay_upper_joins)
+	if (sjinfo->delay_upper_joins)
 		return false;
 
 	if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
@@ -206,52 +509,9 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
-	/*
-	 * We can't remove the join if any inner-rel attributes are used above the
-	 * join.
-	 *
-	 * Note that this test only detects use of inner-rel attributes in higher
-	 * join conditions and the target list.  There might be such attributes in
-	 * pushed-down conditions at this join, too.  We check that case below.
-	 *
-	 * As a micro-optimization, it seems better to start with max_attr and
-	 * count down rather than starting with min_attr and counting up, on the
-	 * theory that the system attributes are somewhat less likely to be wanted
-	 * and should be tested last.
-	 */
-	for (attroff = innerrel->max_attr - innerrel->min_attr;
-		 attroff >= 0;
-		 attroff--)
-	{
-		if (!bms_is_subset(innerrel->attr_needed[attroff], joinrelids))
-			return false;
-	}
-
-	/*
-	 * Similarly check that the inner rel isn't needed by any PlaceHolderVars
-	 * that will be used above the join.  We only need to fail if such a PHV
-	 * actually references some inner-rel attributes; but the correct check
-	 * for that is relatively expensive, so we first check against ph_eval_at,
-	 * which must mention the inner rel if the PHV uses any inner-rel attrs as
-	 * non-lateral references.  Note that if the PHV's syntactic scope is just
-	 * the inner rel, we can't drop the rel even if the PHV is variable-free.
-	 */
-	foreach(l, root->placeholder_list)
-	{
-		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
-
-		if (bms_is_subset(phinfo->ph_needed, joinrelids))
-			continue;			/* PHV is not used above the join */
-		if (bms_overlap(phinfo->ph_lateral, innerrel->relids))
-			return false;		/* it references innerrel laterally */
-		if (!bms_overlap(phinfo->ph_eval_at, innerrel->relids))
-			continue;			/* it definitely doesn't reference innerrel */
-		if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids))
-			return false;		/* there isn't any other place to eval PHV */
-		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
-						innerrel->relids))
-			return false;		/* it does reference innerrel */
-	}
+	/* if the relation is referenced in the query then it cannot be removed */
+	if (relation_is_needed(root, joinrelids, innerrel, NULL))
+		return false;
 
 	/*
 	 * Search for mergejoinable clauses that constrain the inner rel against
@@ -368,6 +628,218 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	return false;
 }
 
+/*
+ * relation_is_needed
+ *		True if any of the Vars from this relation are required in the query
+ */
+static inline bool
+relation_is_needed(PlannerInfo *root, Relids joinrelids, RelOptInfo *rel, Relids ignoredrels)
+{
+	int		  attroff;
+	ListCell *l;
+
+	/*
+	 * rel is referenced if any of it's attributes are used above the join.
+	 *
+	 * Note that this test only detects use of rel's attributes in higher
+	 * join conditions and the target list.  There might be such attributes in
+	 * pushed-down conditions at this join, too.  We check that case below.
+	 *
+	 * As a micro-optimization, it seems better to start with max_attr and
+	 * count down rather than starting with min_attr and counting up, on the
+	 * theory that the system attributes are somewhat less likely to be wanted
+	 * and should be tested last.
+	 */
+	for (attroff = rel->max_attr - rel->min_attr;
+		 attroff >= 0;
+		 attroff--)
+	{
+		if (!bms_is_subset(bms_difference(rel->attr_needed[attroff], ignoredrels), joinrelids))
+			return true;
+	}
+
+	/*
+	 * Similarly check that rel isn't needed by any PlaceHolderVars that will
+	 * be used above the join.  We only need to fail if such a PHV actually
+	 * references some of rel's attributes; but the correct check for that is
+	 * relatively expensive, so we first check against ph_eval_at, which must
+	 * mention rel if the PHV uses any of-rel's attrs as non-lateral
+	 * references.  Note that if the PHV's syntactic scope is just rel, we
+	 * can't return true even if the PHV is variable-free.
+	 */
+	foreach(l, root->placeholder_list)
+	{
+		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
+
+		if (bms_is_subset(phinfo->ph_needed, joinrelids))
+			continue;			/* PHV is not used above the join */
+		if (bms_overlap(phinfo->ph_lateral, rel->relids))
+			return true;		/* it references rel laterally */
+		if (!bms_overlap(phinfo->ph_eval_at, rel->relids))
+			continue;			/* it definitely doesn't reference rel */
+		if (bms_is_subset(phinfo->ph_eval_at, rel->relids))
+			return true;		/* there isn't any other place to eval PHV */
+		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
+						rel->relids))
+			return true;		/* it does reference rel */
+	}
+
+	return false; /* it does not reference rel */
+}
+
+/*
+ * relation_has_foreign_key_for
+ *	  Checks if rel has a foreign key which references referencedrel with the
+ *	  given list of expressions.
+ *
+ *	For the match to succeed:
+ *	  referencing_vars must match the columns defined in the foreign key.
+ *	  index_vars must match the columns defined in the index for the foreign key.
+ */
+static bool
+relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+			RelOptInfo *referencedrel, List *referencing_vars,
+			List *index_vars, List *operator_list)
+{
+	ListCell *lc;
+	Oid		  refreloid;
+
+	/*
+	 * Look up the Oid of the referenced relation. We only want to look at
+	 * foreign keys on the referencing relation which reference this relation.
+	 */
+	refreloid = root->simple_rte_array[referencedrel->relid]->relid;
+
+	Assert(list_length(referencing_vars) > 0);
+	Assert(list_length(referencing_vars) == list_length(index_vars));
+	Assert(list_length(referencing_vars) == list_length(operator_list));
+
+	/*
+	 * Search through each foreign key on the referencing relation and try
+	 * to find one which references the relation in the join condition. If we
+	 * find one then we'll send the join conditions off to
+	 * expressions_match_foreign_key() to see if they match the foreign key.
+	 */
+	foreach(lc, rel->fklist)
+	{
+		ForeignKeyInfo *fk = (ForeignKeyInfo *) lfirst(lc);
+
+		if (fk->confrelid == refreloid)
+		{
+			if (expressions_match_foreign_key(fk, referencing_vars,
+				index_vars, operator_list))
+				return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * expressions_match_foreign_key
+ *		True if the given fkvars, indexvars and operators will match
+ *		exactly 1 record in the referenced relation of the foreign key.
+ *
+ * Note: This function expects fkvars and indexvars to only contain Var types.
+ *		 Expression indexes are not supported by foreign keys.
+ */
+static bool
+expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					List *indexvars, List *operators)
+{
+	ListCell  *lc;
+	ListCell  *lc2;
+	ListCell  *lc3;
+	Bitmapset *allitems;
+	Bitmapset *matcheditems;
+	int		   lstidx;
+	int		   col;
+
+	Assert(list_length(fkvars) == list_length(indexvars));
+	Assert(list_length(fkvars) == list_length(operators));
+
+	/*
+	 * Fast path out if there's not enough conditions to match each column in
+	 * the foreign key. Note that we cannot check that the number of
+	 * expressions are equal here since it would cause any expressions which
+	 * are duplicated not to match.
+	 */
+	if (list_length(fkvars) < fk->conncols)
+		return false;
+
+	/*
+	 * We need to ensure that each foreign key column can be matched to a list
+	 * item, and we need to ensure that each list item can be matched to a
+	 * foreign key column. We do this by looping over each foreign key column
+	 * and checking that we can find an item in the list which matches the
+	 * current column, however this method does not allow us to ensure that no
+	 * additional items exist in the list. We could solve that by performing
+	 * another loop over each list item and check that it matches a foreign key
+	 * column, but that's a bit wasteful. Instead we'll use 2 bitmapsets, one
+	 * to store the 0 based index of each list item, and with the other we'll
+	 * store each list index that we've managed to match. After we're done
+	 * matching we'll just make sure that both bitmapsets are equal.
+	 */
+	allitems = NULL;
+	matcheditems = NULL;
+
+	/*
+	 * Build a bitmapset which contains each 1 based list index. It seems more
+	 * efficient to do this in reverse so that we allocate enough memory for
+	 * the bitmapset on first loop rather than reallocating each time we find
+	 * we need a bit more space.
+	 */
+	for (lstidx = list_length(fkvars) - 1; lstidx >= 0; lstidx--)
+		allitems = bms_add_member(allitems, lstidx);
+
+	for (col = 0; col < fk->conncols; col++)
+	{
+		bool  matched = false;
+
+		lstidx = 0;
+
+		forthree(lc, fkvars, lc2, indexvars, lc3, operators)
+		{
+			Var *expr = (Var *) lfirst(lc);
+			Var *idxexpr = (Var *) lfirst(lc2);
+			Oid  opr = lfirst_oid(lc3);
+
+			Assert(IsA(expr, Var));
+			Assert(IsA(idxexpr, Var));
+
+			/* Does this join qual match up to the current fkey column? */
+			if (fk->conkey[col] == expr->varattno &&
+				fk->confkey[col] == idxexpr->varattno &&
+				equality_ops_are_compatible(opr, fk->conpfeqop[col]))
+			{
+				matched = true;
+
+				/* mark this list item as matched */
+				matcheditems = bms_add_member(matcheditems, lstidx);
+
+				/*
+				 * Don't break here as there may be duplicate expressions
+				 * that we also need to match against.
+				 */
+			}
+			lstidx++;
+		}
+
+		/* punt if there's no match. */
+		if (!matched)
+			return false;
+	}
+
+	/*
+	 * Ensure that we managed to match every item in the list to a foreign key
+	 * column.
+	 */
+	if (!bms_equal(allitems, matcheditems))
+		return false;
+
+	return true; /* matched */
+}
+
 
 /*
  * Remove the target relid from the planner's data structures, having
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 655be81..7540a1d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4676,6 +4676,15 @@ make_lockrows(Plan *lefttree, List *rowMarks, int epqParam)
 	return node;
 }
 
+AlternativePlan *
+make_alternativeplan(List *planlist)
+{
+	AlternativePlan *node = makeNode(AlternativePlan);
+	node->planList = planlist;
+
+	return node;
+}
+
 /*
  * Note: offset_est and count_est are passed in to save having to repeat
  * work already done to estimate the values of the limitOffset and limitCount
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index b90c2ef..5cd2ab5 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -409,6 +409,7 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	Path	   *sorted_path;
 	Cost		path_cost;
 	double		path_fraction;
+	List	   *final_rel_list;
 
 	/*----------
 	 * Generate modified query of the form
@@ -478,8 +479,12 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	subroot->tuple_fraction = 1.0;
 	subroot->limit_tuples = 1.0;
 
-	final_rel = query_planner(subroot, parse->targetList,
-							  minmax_qp_callback, NULL);
+	final_rel_list = query_planner(subroot, parse->targetList,
+							  minmax_qp_callback, NULL, true);
+
+	Assert(list_length(final_rel_list) ==  1);
+
+	final_rel = (RelOptInfo *) linitial(final_rel_list);
 
 	/*
 	 * Get the best presorted path, that being the one that's cheapest for
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 848df97..6cf5915 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -34,7 +34,7 @@
  *
  * Since query_planner does not handle the toplevel processing (grouping,
  * sorting, etc) it cannot select the best path by itself.  Instead, it
- * returns the RelOptInfo for the top level of joining, and the caller
+ * returns a list of RelOptInfo for the top level of joining, and the caller
  * (grouping_planner) can choose one of the surviving paths for the rel.
  * Normally it would choose either the rel's cheapest path, or the cheapest
  * path for the desired sort order.
@@ -50,14 +50,23 @@
  * plan.  This value is *not* available at call time, but is computed by
  * qp_callback once we have completed merging the query's equivalence classes.
  * (We cannot construct canonical pathkeys until that's done.)
+ *
+ * Note: during the planning process, the planner may discover optimization
+ * opportunities that may or may not be possible to utiliize during query
+ * execution. In this case the planner will generate 2 plans. 1 for the fully
+ * optimized version, and 1 all purpose plan which will only be used if
+ * conditions are not found to be favourable for the optimized version of the
+ * plan during executor startup.
  */
-RelOptInfo *
+List *
 query_planner(PlannerInfo *root, List *tlist,
-			  query_pathkeys_callback qp_callback, void *qp_extra)
+			  query_pathkeys_callback qp_callback, void *qp_extra,
+			  bool all_purpose_plan_only)
 {
 	Query	   *parse = root->parse;
 	List	   *joinlist;
 	RelOptInfo *final_rel;
+	List	   *final_rel_list = NIL;
 	Index		rti;
 	double		total_pages;
 
@@ -84,7 +93,7 @@ query_planner(PlannerInfo *root, List *tlist,
 		root->canon_pathkeys = NIL;
 		(*qp_callback) (root, qp_extra);
 
-		return final_rel;
+		return lappend(NIL, final_rel);
 	}
 
 	/*
@@ -231,14 +240,37 @@ query_planner(PlannerInfo *root, List *tlist,
 	root->total_table_pages = total_pages;
 
 	/*
-	 * Ready to do the primary planning.
+	 * If the planner found any optimizations that caused the plan not to be
+	 * suitable in all situations, then we must create 2 plans. One will be
+	 * the fully the optimized version and the other will be a general purpose
+	 * plan that will only be used by the executor if any of the required
+	 * conditions for the optimization were not met. Note that we'll only
+	 * generate an optimized plan if the caller didn't specifically request an
+	 * all purpose plan.
 	 */
-	final_rel = make_one_rel(root, joinlist);
+	if (root->glob->suitableFor != PLAN_SUITABILITY_ALL_PURPOSE
+		&& all_purpose_plan_only == false)
+	{
+		/* Generate fully optimized plan, with all removable joins removed */
+		final_rel = make_one_rel(root, joinlist, root->glob->suitableFor);
+
+		/* Check that we got at least one usable path */
+		if (!final_rel || !final_rel->cheapest_total_path ||
+			final_rel->cheapest_total_path->param_info != NULL)
+			elog(ERROR, "failed to construct the join relation");
+
+		final_rel_list = lappend(final_rel_list, final_rel);
+	}
+
+	/* generate an all purpose plan */
+	final_rel = make_one_rel(root, joinlist, PLAN_SUITABILITY_ALL_PURPOSE);
 
 	/* Check that we got at least one usable path */
 	if (!final_rel || !final_rel->cheapest_total_path ||
 		final_rel->cheapest_total_path->param_info != NULL)
 		elog(ERROR, "failed to construct the join relation");
 
-	return final_rel;
+	final_rel_list = lappend(final_rel_list, final_rel);
+
+	return final_rel_list;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9cbbcfb..7ca31e3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -178,6 +178,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	glob->lastRowMarkId = 0;
 	glob->transientPlan = false;
 	glob->hasRowSecurity = false;
+	glob->suitableFor = PLAN_SUITABILITY_ALL_PURPOSE;
 
 	/* Determine what fraction of the plan is likely to be scanned */
 	if (cursorOptions & CURSOR_OPT_FAST_PLAN)
@@ -256,6 +257,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	result->invalItems = glob->invalItems;
 	result->nParamExec = glob->nParamExec;
 	result->hasRowSecurity = glob->hasRowSecurity;
+	result->suitableFor = glob->suitableFor;
 
 	return result;
 }
@@ -1087,10 +1089,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	int64		count_est = 0;
 	double		limit_tuples = -1.0;
 	Plan	   *result_plan;
+	List	   *result_plan_list = NIL;
 	List	   *current_pathkeys;
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	ListCell   *lc;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1169,6 +1173,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		root->sort_pathkeys = make_pathkeys_for_sortclauses(root,
 															parse->sortClause,
 															tlist);
+
+		result_plan_list = list_make1(result_plan);
 	}
 	else
 	{
@@ -1178,6 +1184,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		bool		need_tlist_eval = true;
 		standard_qp_extra qp_extra;
 		RelOptInfo *final_rel;
+		List	   *final_rel_list;
 		Path	   *cheapest_path;
 		Path	   *sorted_path;
 		Path	   *best_path;
@@ -1288,710 +1295,723 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		 * standard_qp_callback) pathkey representations of the query's sort
 		 * clause, distinct clause, etc.
 		 */
-		final_rel = query_planner(root, sub_tlist,
-								  standard_qp_callback, &qp_extra);
-
-		/*
-		 * Extract rowcount and width estimates for use below.
-		 */
-		path_rows = final_rel->rows;
-		path_width = final_rel->width;
+		final_rel_list = query_planner(root, sub_tlist,
+							  standard_qp_callback, &qp_extra, false);
 
-		/*
-		 * If there's grouping going on, estimate the number of result groups.
-		 * We couldn't do this any earlier because it depends on relation size
-		 * estimates that are created within query_planner().
-		 *
-		 * Then convert tuple_fraction to fractional form if it is absolute,
-		 * and if grouping or aggregation is involved, adjust tuple_fraction
-		 * to describe the fraction of the underlying un-aggregated tuples
-		 * that will be fetched.
-		 */
-		dNumGroups = 1;			/* in case not grouping */
-
-		if (parse->groupClause)
+		foreach(lc, final_rel_list)
 		{
-			List	   *groupExprs;
-
-			groupExprs = get_sortgrouplist_exprs(parse->groupClause,
-												 parse->targetList);
-			dNumGroups = estimate_num_groups(root, groupExprs, path_rows);
-
+			final_rel = (RelOptInfo *) lfirst(lc);
 			/*
-			 * In GROUP BY mode, an absolute LIMIT is relative to the number
-			 * of groups not the number of tuples.  If the caller gave us a
-			 * fraction, keep it as-is.  (In both cases, we are effectively
-			 * assuming that all the groups are about the same size.)
+			 * Extract rowcount and width estimates for use below.
 			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= dNumGroups;
+			path_rows = final_rel->rows;
+			path_width = final_rel->width;
 
 			/*
-			 * If both GROUP BY and ORDER BY are specified, we will need two
-			 * levels of sort --- and, therefore, certainly need to read all
-			 * the tuples --- unless ORDER BY is a subset of GROUP BY.
-			 * Likewise if we have both DISTINCT and GROUP BY, or if we have a
-			 * window specification not compatible with the GROUP BY.
-			 */
-			if (!pathkeys_contained_in(root->sort_pathkeys,
-									   root->group_pathkeys) ||
-				!pathkeys_contained_in(root->distinct_pathkeys,
-									   root->group_pathkeys) ||
-				!pathkeys_contained_in(root->window_pathkeys,
-									   root->group_pathkeys))
-				tuple_fraction = 0.0;
-		}
-		else if (parse->hasAggs || root->hasHavingQual)
-		{
-			/*
-			 * Ungrouped aggregate will certainly want to read all the tuples,
-			 * and it will deliver a single result row (so leave dNumGroups
-			 * set to 1).
-			 */
-			tuple_fraction = 0.0;
-		}
-		else if (parse->distinctClause)
-		{
-			/*
-			 * Since there was no grouping or aggregation, it's reasonable to
-			 * assume the UNIQUE filter has effects comparable to GROUP BY.
-			 * (If DISTINCT is used with grouping, we ignore its effects for
-			 * rowcount estimation purposes; this amounts to assuming the
-			 * grouped rows are distinct already.)
-			 */
-			List	   *distinctExprs;
-
-			distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-													parse->targetList);
-			dNumGroups = estimate_num_groups(root, distinctExprs, path_rows);
-
-			/*
-			 * Adjust tuple_fraction the same way as for GROUP BY, too.
-			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= dNumGroups;
-		}
-		else
-		{
-			/*
-			 * Plain non-grouped, non-aggregated query: an absolute tuple
-			 * fraction can be divided by the number of tuples.
+			 * If there's grouping going on, estimate the number of result groups.
+			 * We couldn't do this any earlier because it depends on relation size
+			 * estimates that are created within query_planner().
+			 *
+			 * Then convert tuple_fraction to fractional form if it is absolute,
+			 * and if grouping or aggregation is involved, adjust tuple_fraction
+			 * to describe the fraction of the underlying un-aggregated tuples
+			 * that will be fetched.
 			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= path_rows;
-		}
+			dNumGroups = 1;			/* in case not grouping */
 
-		/*
-		 * Pick out the cheapest-total path as well as the cheapest presorted
-		 * path for the requested pathkeys (if there is one).  We should take
-		 * the tuple fraction into account when selecting the cheapest
-		 * presorted path, but not when selecting the cheapest-total path,
-		 * since if we have to sort then we'll have to fetch all the tuples.
-		 * (But there's a special case: if query_pathkeys is NIL, meaning
-		 * order doesn't matter, then the "cheapest presorted" path will be
-		 * the cheapest overall for the tuple fraction.)
-		 */
-		cheapest_path = final_rel->cheapest_total_path;
-
-		sorted_path =
-			get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
-													  root->query_pathkeys,
-													  NULL,
-													  tuple_fraction);
+			if (parse->groupClause)
+			{
+				List	   *groupExprs;
 
-		/* Don't consider same path in both guises; just wastes effort */
-		if (sorted_path == cheapest_path)
-			sorted_path = NULL;
+				groupExprs = get_sortgrouplist_exprs(parse->groupClause,
+													 parse->targetList);
+				dNumGroups = estimate_num_groups(root, groupExprs, path_rows);
 
-		/*
-		 * Forget about the presorted path if it would be cheaper to sort the
-		 * cheapest-total path.  Here we need consider only the behavior at
-		 * the tuple_fraction point.  Also, limit_tuples is only relevant if
-		 * not grouping/aggregating, so use root->limit_tuples in the
-		 * cost_sort call.
-		 */
-		if (sorted_path)
-		{
-			Path		sort_path;		/* dummy for result of cost_sort */
+				/*
+				 * In GROUP BY mode, an absolute LIMIT is relative to the number
+				 * of groups not the number of tuples.  If the caller gave us a
+				 * fraction, keep it as-is.  (In both cases, we are effectively
+				 * assuming that all the groups are about the same size.)
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= dNumGroups;
 
-			if (root->query_pathkeys == NIL ||
-				pathkeys_contained_in(root->query_pathkeys,
-									  cheapest_path->pathkeys))
-			{
-				/* No sort needed for cheapest path */
-				sort_path.startup_cost = cheapest_path->startup_cost;
-				sort_path.total_cost = cheapest_path->total_cost;
+				/*
+				 * If both GROUP BY and ORDER BY are specified, we will need two
+				 * levels of sort --- and, therefore, certainly need to read all
+				 * the tuples --- unless ORDER BY is a subset of GROUP BY.
+				 * Likewise if we have both DISTINCT and GROUP BY, or if we have a
+				 * window specification not compatible with the GROUP BY.
+				 */
+				if (!pathkeys_contained_in(root->sort_pathkeys,
+										   root->group_pathkeys) ||
+					!pathkeys_contained_in(root->distinct_pathkeys,
+										   root->group_pathkeys) ||
+					!pathkeys_contained_in(root->window_pathkeys,
+										   root->group_pathkeys))
+					tuple_fraction = 0.0;
 			}
-			else
+			else if (parse->hasAggs || root->hasHavingQual)
 			{
-				/* Figure cost for sorting */
-				cost_sort(&sort_path, root, root->query_pathkeys,
-						  cheapest_path->total_cost,
-						  path_rows, path_width,
-						  0.0, work_mem, root->limit_tuples);
+				/*
+				 * Ungrouped aggregate will certainly want to read all the tuples,
+				 * and it will deliver a single result row (so leave dNumGroups
+				 * set to 1).
+				 */
+				tuple_fraction = 0.0;
 			}
-
-			if (compare_fractional_path_costs(sorted_path, &sort_path,
-											  tuple_fraction) > 0)
+			else if (parse->distinctClause)
 			{
-				/* Presorted path is a loser */
-				sorted_path = NULL;
-			}
-		}
+				/*
+				 * Since there was no grouping or aggregation, it's reasonable to
+				 * assume the UNIQUE filter has effects comparable to GROUP BY.
+				 * (If DISTINCT is used with grouping, we ignore its effects for
+				 * rowcount estimation purposes; this amounts to assuming the
+				 * grouped rows are distinct already.)
+				 */
+				List	   *distinctExprs;
 
-		/*
-		 * Consider whether we want to use hashing instead of sorting.
-		 */
-		if (parse->groupClause)
-		{
-			/*
-			 * If grouping, decide whether to use sorted or hashed grouping.
-			 */
-			use_hashed_grouping =
-				choose_hashed_grouping(root,
-									   tuple_fraction, limit_tuples,
-									   path_rows, path_width,
-									   cheapest_path, sorted_path,
-									   dNumGroups, &agg_costs);
-			/* Also convert # groups to long int --- but 'ware overflow! */
-			numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
-		}
-		else if (parse->distinctClause && sorted_path &&
-				 !root->hasHavingQual && !parse->hasAggs && !activeWindows)
-		{
-			/*
-			 * We'll reach the DISTINCT stage without any intermediate
-			 * processing, so figure out whether we will want to hash or not
-			 * so we can choose whether to use cheapest or sorted path.
-			 */
-			use_hashed_distinct =
-				choose_hashed_distinct(root,
-									   tuple_fraction, limit_tuples,
-									   path_rows, path_width,
-									   cheapest_path->startup_cost,
-									   cheapest_path->total_cost,
-									   sorted_path->startup_cost,
-									   sorted_path->total_cost,
-									   sorted_path->pathkeys,
-									   dNumGroups);
-			tested_hashed_distinct = true;
-		}
+				distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
+														parse->targetList);
+				dNumGroups = estimate_num_groups(root, distinctExprs, path_rows);
 
-		/*
-		 * Select the best path.  If we are doing hashed grouping, we will
-		 * always read all the input tuples, so use the cheapest-total path.
-		 * Otherwise, the comparison above is correct.
-		 */
-		if (use_hashed_grouping || use_hashed_distinct || !sorted_path)
-			best_path = cheapest_path;
-		else
-			best_path = sorted_path;
+				/*
+				 * Adjust tuple_fraction the same way as for GROUP BY, too.
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= dNumGroups;
+			}
+			else
+			{
+				/*
+				 * Plain non-grouped, non-aggregated query: an absolute tuple
+				 * fraction can be divided by the number of tuples.
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= path_rows;
+			}
 
-		/*
-		 * Check to see if it's possible to optimize MIN/MAX aggregates. If
-		 * so, we will forget all the work we did so far to choose a "regular"
-		 * path ... but we had to do it anyway to be able to tell which way is
-		 * cheaper.
-		 */
-		result_plan = optimize_minmax_aggregates(root,
-												 tlist,
-												 &agg_costs,
-												 best_path);
-		if (result_plan != NULL)
-		{
-			/*
-			 * optimize_minmax_aggregates generated the full plan, with the
-			 * right tlist, and it has no sort order.
-			 */
-			current_pathkeys = NIL;
-		}
-		else
-		{
 			/*
-			 * Normal case --- create a plan according to query_planner's
-			 * results.
+			 * Pick out the cheapest-total path as well as the cheapest presorted
+			 * path for the requested pathkeys (if there is one).  We should take
+			 * the tuple fraction into account when selecting the cheapest
+			 * presorted path, but not when selecting the cheapest-total path,
+			 * since if we have to sort then we'll have to fetch all the tuples.
+			 * (But there's a special case: if query_pathkeys is NIL, meaning
+			 * order doesn't matter, then the "cheapest presorted" path will be
+			 * the cheapest overall for the tuple fraction.)
 			 */
-			bool		need_sort_for_grouping = false;
+			cheapest_path = final_rel->cheapest_total_path;
 
-			result_plan = create_plan(root, best_path);
-			current_pathkeys = best_path->pathkeys;
+			sorted_path =
+				get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+														  root->query_pathkeys,
+														  NULL,
+														  tuple_fraction);
 
-			/* Detect if we'll need an explicit sort for grouping */
-			if (parse->groupClause && !use_hashed_grouping &&
-			  !pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
-			{
-				need_sort_for_grouping = true;
-
-				/*
-				 * Always override create_plan's tlist, so that we don't sort
-				 * useless data from a "physical" tlist.
-				 */
-				need_tlist_eval = true;
-			}
+			/* Don't consider same path in both guises; just wastes effort */
+			if (sorted_path == cheapest_path)
+				sorted_path = NULL;
 
 			/*
-			 * create_plan returns a plan with just a "flat" tlist of required
-			 * Vars.  Usually we need to insert the sub_tlist as the tlist of
-			 * the top plan node.  However, we can skip that if we determined
-			 * that whatever create_plan chose to return will be good enough.
+			 * Forget about the presorted path if it would be cheaper to sort the
+			 * cheapest-total path.  Here we need consider only the behavior at
+			 * the tuple_fraction point.  Also, limit_tuples is only relevant if
+			 * not grouping/aggregating, so use root->limit_tuples in the
+			 * cost_sort call.
 			 */
-			if (need_tlist_eval)
+			if (sorted_path)
 			{
-				/*
-				 * If the top-level plan node is one that cannot do expression
-				 * evaluation and its existing target list isn't already what
-				 * we need, we must insert a Result node to project the
-				 * desired tlist.
-				 */
-				if (!is_projection_capable_plan(result_plan) &&
-					!tlist_same_exprs(sub_tlist, result_plan->targetlist))
+				Path		sort_path;		/* dummy for result of cost_sort */
+
+				if (root->query_pathkeys == NIL ||
+					pathkeys_contained_in(root->query_pathkeys,
+										  cheapest_path->pathkeys))
 				{
-					result_plan = (Plan *) make_result(root,
-													   sub_tlist,
-													   NULL,
-													   result_plan);
+					/* No sort needed for cheapest path */
+					sort_path.startup_cost = cheapest_path->startup_cost;
+					sort_path.total_cost = cheapest_path->total_cost;
 				}
 				else
 				{
-					/*
-					 * Otherwise, just replace the subplan's flat tlist with
-					 * the desired tlist.
-					 */
-					result_plan->targetlist = sub_tlist;
+					/* Figure cost for sorting */
+					cost_sort(&sort_path, root, root->query_pathkeys,
+							  cheapest_path->total_cost,
+							  path_rows, path_width,
+							  0.0, work_mem, root->limit_tuples);
 				}
 
+				if (compare_fractional_path_costs(sorted_path, &sort_path,
+												  tuple_fraction) > 0)
+				{
+					/* Presorted path is a loser */
+					sorted_path = NULL;
+				}
+			}
+
+			/*
+			 * Consider whether we want to use hashing instead of sorting.
+			 */
+			if (parse->groupClause)
+			{
 				/*
-				 * Also, account for the cost of evaluation of the sub_tlist.
-				 * See comments for add_tlist_costs_to_plan() for more info.
+				 * If grouping, decide whether to use sorted or hashed grouping.
 				 */
-				add_tlist_costs_to_plan(root, result_plan, sub_tlist);
+				use_hashed_grouping =
+					choose_hashed_grouping(root,
+										   tuple_fraction, limit_tuples,
+										   path_rows, path_width,
+										   cheapest_path, sorted_path,
+										   dNumGroups, &agg_costs);
+				/* Also convert # groups to long int --- but 'ware overflow! */
+				numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 			}
-			else
+			else if (parse->distinctClause && sorted_path &&
+					 !root->hasHavingQual && !parse->hasAggs && !activeWindows)
 			{
 				/*
-				 * Since we're using create_plan's tlist and not the one
-				 * make_subplanTargetList calculated, we have to refigure any
-				 * grouping-column indexes make_subplanTargetList computed.
+				 * We'll reach the DISTINCT stage without any intermediate
+				 * processing, so figure out whether we will want to hash or not
+				 * so we can choose whether to use cheapest or sorted path.
 				 */
-				locate_grouping_columns(root, tlist, result_plan->targetlist,
-										groupColIdx);
+				use_hashed_distinct =
+					choose_hashed_distinct(root,
+										   tuple_fraction, limit_tuples,
+										   path_rows, path_width,
+										   cheapest_path->startup_cost,
+										   cheapest_path->total_cost,
+										   sorted_path->startup_cost,
+										   sorted_path->total_cost,
+										   sorted_path->pathkeys,
+										   dNumGroups);
+				tested_hashed_distinct = true;
 			}
 
 			/*
-			 * Insert AGG or GROUP node if needed, plus an explicit sort step
-			 * if necessary.
-			 *
-			 * HAVING clause, if any, becomes qual of the Agg or Group node.
+			 * Select the best path.  If we are doing hashed grouping, we will
+			 * always read all the input tuples, so use the cheapest-total path.
+			 * Otherwise, the comparison above is correct.
 			 */
-			if (use_hashed_grouping)
+			if (use_hashed_grouping || use_hashed_distinct || !sorted_path)
+				best_path = cheapest_path;
+			else
+				best_path = sorted_path;
+
+			/*
+			 * Check to see if it's possible to optimize MIN/MAX aggregates. If
+			 * so, we will forget all the work we did so far to choose a "regular"
+			 * path ... but we had to do it anyway to be able to tell which way is
+			 * cheaper.
+			 */
+			result_plan = optimize_minmax_aggregates(root,
+													 tlist,
+													 &agg_costs,
+													 best_path);
+			if (result_plan != NULL)
 			{
-				/* Hashed aggregate plan --- no sort needed */
-				result_plan = (Plan *) make_agg(root,
-												tlist,
-												(List *) parse->havingQual,
-												AGG_HASHED,
-												&agg_costs,
-												numGroupCols,
-												groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												numGroups,
-												result_plan);
-				/* Hashed aggregation produces randomly-ordered results */
+				/*
+				 * optimize_minmax_aggregates generated the full plan, with the
+				 * right tlist, and it has no sort order.
+				 */
 				current_pathkeys = NIL;
 			}
-			else if (parse->hasAggs)
+			else
 			{
-				/* Plain aggregate plan --- sort if needed */
-				AggStrategy aggstrategy;
+				/*
+				 * Normal case --- create a plan according to query_planner's
+				 * results.
+				 */
+				bool		need_sort_for_grouping = false;
+
+				result_plan = create_plan(root, best_path);
+				current_pathkeys = best_path->pathkeys;
 
-				if (parse->groupClause)
+				/* Detect if we'll need an explicit sort for grouping */
+				if (parse->groupClause && !use_hashed_grouping &&
+				  !pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
 				{
-					if (need_sort_for_grouping)
+					need_sort_for_grouping = true;
+
+					/*
+					 * Always override create_plan's tlist, so that we don't sort
+					 * useless data from a "physical" tlist.
+					 */
+					need_tlist_eval = true;
+				}
+
+				/*
+				 * create_plan returns a plan with just a "flat" tlist of required
+				 * Vars.  Usually we need to insert the sub_tlist as the tlist of
+				 * the top plan node.  However, we can skip that if we determined
+				 * that whatever create_plan chose to return will be good enough.
+				 */
+				if (need_tlist_eval)
+				{
+					/*
+					 * If the top-level plan node is one that cannot do expression
+					 * evaluation and its existing target list isn't already what
+					 * we need, we must insert a Result node to project the
+					 * desired tlist.
+					 */
+					if (!is_projection_capable_plan(result_plan) &&
+						!tlist_same_exprs(sub_tlist, result_plan->targetlist))
 					{
-						result_plan = (Plan *)
-							make_sort_from_groupcols(root,
-													 parse->groupClause,
-													 groupColIdx,
-													 result_plan);
-						current_pathkeys = root->group_pathkeys;
+						result_plan = (Plan *) make_result(root,
+														   sub_tlist,
+														   NULL,
+														   result_plan);
+					}
+					else
+					{
+						/*
+						 * Otherwise, just replace the subplan's flat tlist with
+						 * the desired tlist.
+						 */
+						result_plan->targetlist = sub_tlist;
 					}
-					aggstrategy = AGG_SORTED;
 
 					/*
-					 * The AGG node will not change the sort ordering of its
-					 * groups, so current_pathkeys describes the result too.
+					 * Also, account for the cost of evaluation of the sub_tlist.
+					 * See comments for add_tlist_costs_to_plan() for more info.
 					 */
+					add_tlist_costs_to_plan(root, result_plan, sub_tlist);
 				}
 				else
 				{
-					aggstrategy = AGG_PLAIN;
-					/* Result will be only one row anyway; no sort order */
-					current_pathkeys = NIL;
+					/*
+					 * Since we're using create_plan's tlist and not the one
+					 * make_subplanTargetList calculated, we have to refigure any
+					 * grouping-column indexes make_subplanTargetList computed.
+					 */
+					locate_grouping_columns(root, tlist, result_plan->targetlist,
+											groupColIdx);
 				}
 
-				result_plan = (Plan *) make_agg(root,
-												tlist,
-												(List *) parse->havingQual,
-												aggstrategy,
-												&agg_costs,
-												numGroupCols,
-												groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												numGroups,
-												result_plan);
-			}
-			else if (parse->groupClause)
-			{
 				/*
-				 * GROUP BY without aggregation, so insert a group node (plus
-				 * the appropriate sort node, if necessary).
+				 * Insert AGG or GROUP node if needed, plus an explicit sort step
+				 * if necessary.
 				 *
-				 * Add an explicit sort if we couldn't make the path come out
-				 * the way the GROUP node needs it.
+				 * HAVING clause, if any, becomes qual of the Agg or Group node.
 				 */
-				if (need_sort_for_grouping)
+				if (use_hashed_grouping)
 				{
-					result_plan = (Plan *)
-						make_sort_from_groupcols(root,
-												 parse->groupClause,
-												 groupColIdx,
-												 result_plan);
-					current_pathkeys = root->group_pathkeys;
+					/* Hashed aggregate plan --- no sort needed */
+					result_plan = (Plan *) make_agg(root,
+													tlist,
+													(List *) parse->havingQual,
+													AGG_HASHED,
+													&agg_costs,
+													numGroupCols,
+													groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													numGroups,
+													result_plan);
+					/* Hashed aggregation produces randomly-ordered results */
+					current_pathkeys = NIL;
 				}
+				else if (parse->hasAggs)
+				{
+					/* Plain aggregate plan --- sort if needed */
+					AggStrategy aggstrategy;
 
-				result_plan = (Plan *) make_group(root,
-												  tlist,
-												  (List *) parse->havingQual,
-												  numGroupCols,
-												  groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												  dNumGroups,
-												  result_plan);
-				/* The Group node won't change sort ordering */
-			}
-			else if (root->hasHavingQual)
-			{
-				/*
-				 * No aggregates, and no GROUP BY, but we have a HAVING qual.
-				 * This is a degenerate case in which we are supposed to emit
-				 * either 0 or 1 row depending on whether HAVING succeeds.
-				 * Furthermore, there cannot be any variables in either HAVING
-				 * or the targetlist, so we actually do not need the FROM
-				 * table at all!  We can just throw away the plan-so-far and
-				 * generate a Result node.  This is a sufficiently unusual
-				 * corner case that it's not worth contorting the structure of
-				 * this routine to avoid having to generate the plan in the
-				 * first place.
-				 */
-				result_plan = (Plan *) make_result(root,
-												   tlist,
-												   parse->havingQual,
-												   NULL);
-			}
-		}						/* end of non-minmax-aggregate case */
-
-		/*
-		 * Since each window function could require a different sort order, we
-		 * stack up a WindowAgg node for each window, with sort steps between
-		 * them as needed.
-		 */
-		if (activeWindows)
-		{
-			List	   *window_tlist;
-			ListCell   *l;
+					if (parse->groupClause)
+					{
+						if (need_sort_for_grouping)
+						{
+							result_plan = (Plan *)
+								make_sort_from_groupcols(root,
+														 parse->groupClause,
+														 groupColIdx,
+														 result_plan);
+							current_pathkeys = root->group_pathkeys;
+						}
+						aggstrategy = AGG_SORTED;
+
+						/*
+						 * The AGG node will not change the sort ordering of its
+						 * groups, so current_pathkeys describes the result too.
+						 */
+					}
+					else
+					{
+						aggstrategy = AGG_PLAIN;
+						/* Result will be only one row anyway; no sort order */
+						current_pathkeys = NIL;
+					}
 
-			/*
-			 * If the top-level plan node is one that cannot do expression
-			 * evaluation, we must insert a Result node to project the desired
-			 * tlist.  (In some cases this might not really be required, but
-			 * it's not worth trying to avoid it.  In particular, think not to
-			 * skip adding the Result if the initial window_tlist matches the
-			 * top-level plan node's output, because we might change the tlist
-			 * inside the following loop.)	Note that on second and subsequent
-			 * passes through the following loop, the top-level node will be a
-			 * WindowAgg which we know can project; so we only need to check
-			 * once.
-			 */
-			if (!is_projection_capable_plan(result_plan))
-			{
-				result_plan = (Plan *) make_result(root,
-												   NIL,
-												   NULL,
-												   result_plan);
-			}
+					result_plan = (Plan *) make_agg(root,
+													tlist,
+													(List *) parse->havingQual,
+													aggstrategy,
+													&agg_costs,
+													numGroupCols,
+													groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													numGroups,
+													result_plan);
+				}
+				else if (parse->groupClause)
+				{
+					/*
+					 * GROUP BY without aggregation, so insert a group node (plus
+					 * the appropriate sort node, if necessary).
+					 *
+					 * Add an explicit sort if we couldn't make the path come out
+					 * the way the GROUP node needs it.
+					 */
+					if (need_sort_for_grouping)
+					{
+						result_plan = (Plan *)
+							make_sort_from_groupcols(root,
+													 parse->groupClause,
+													 groupColIdx,
+													 result_plan);
+						current_pathkeys = root->group_pathkeys;
+					}
 
-			/*
-			 * The "base" targetlist for all steps of the windowing process is
-			 * a flat tlist of all Vars and Aggs needed in the result.  (In
-			 * some cases we wouldn't need to propagate all of these all the
-			 * way to the top, since they might only be needed as inputs to
-			 * WindowFuncs.  It's probably not worth trying to optimize that
-			 * though.)  We also add window partitioning and sorting
-			 * expressions to the base tlist, to ensure they're computed only
-			 * once at the bottom of the stack (that's critical for volatile
-			 * functions).  As we climb up the stack, we'll add outputs for
-			 * the WindowFuncs computed at each level.
-			 */
-			window_tlist = make_windowInputTargetList(root,
+					result_plan = (Plan *) make_group(root,
 													  tlist,
-													  activeWindows);
+													  (List *) parse->havingQual,
+													  numGroupCols,
+													  groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													  dNumGroups,
+													  result_plan);
+					/* The Group node won't change sort ordering */
+				}
+				else if (root->hasHavingQual)
+				{
+					/*
+					 * No aggregates, and no GROUP BY, but we have a HAVING qual.
+					 * This is a degenerate case in which we are supposed to emit
+					 * either 0 or 1 row depending on whether HAVING succeeds.
+					 * Furthermore, there cannot be any variables in either HAVING
+					 * or the targetlist, so we actually do not need the FROM
+					 * table at all!  We can just throw away the plan-so-far and
+					 * generate a Result node.  This is a sufficiently unusual
+					 * corner case that it's not worth contorting the structure of
+					 * this routine to avoid having to generate the plan in the
+					 * first place.
+					 */
+					result_plan = (Plan *) make_result(root,
+													   tlist,
+													   parse->havingQual,
+													   NULL);
+				}
+			}						/* end of non-minmax-aggregate case */
 
 			/*
-			 * The copyObject steps here are needed to ensure that each plan
-			 * node has a separately modifiable tlist.  (XXX wouldn't a
-			 * shallow list copy do for that?)
+			 * Since each window function could require a different sort order, we
+			 * stack up a WindowAgg node for each window, with sort steps between
+			 * them as needed.
 			 */
-			result_plan->targetlist = (List *) copyObject(window_tlist);
-
-			foreach(l, activeWindows)
+			if (activeWindows)
 			{
-				WindowClause *wc = (WindowClause *) lfirst(l);
-				List	   *window_pathkeys;
-				int			partNumCols;
-				AttrNumber *partColIdx;
-				Oid		   *partOperators;
-				int			ordNumCols;
-				AttrNumber *ordColIdx;
-				Oid		   *ordOperators;
-
-				window_pathkeys = make_pathkeys_for_window(root,
-														   wc,
-														   tlist);
+				List	   *window_tlist;
+				ListCell   *l;
 
 				/*
-				 * This is a bit tricky: we build a sort node even if we don't
-				 * really have to sort.  Even when no explicit sort is needed,
-				 * we need to have suitable resjunk items added to the input
-				 * plan's tlist for any partitioning or ordering columns that
-				 * aren't plain Vars.  (In theory, make_windowInputTargetList
-				 * should have provided all such columns, but let's not assume
-				 * that here.)	Furthermore, this way we can use existing
-				 * infrastructure to identify which input columns are the
-				 * interesting ones.
+				 * If the top-level plan node is one that cannot do expression
+				 * evaluation, we must insert a Result node to project the desired
+				 * tlist.  (In some cases this might not really be required, but
+				 * it's not worth trying to avoid it.  In particular, think not to
+				 * skip adding the Result if the initial window_tlist matches the
+				 * top-level plan node's output, because we might change the tlist
+				 * inside the following loop.)	Note that on second and subsequent
+				 * passes through the following loop, the top-level node will be a
+				 * WindowAgg which we know can project; so we only need to check
+				 * once.
 				 */
-				if (window_pathkeys)
-				{
-					Sort	   *sort_plan;
-
-					sort_plan = make_sort_from_pathkeys(root,
-														result_plan,
-														window_pathkeys,
-														-1.0);
-					if (!pathkeys_contained_in(window_pathkeys,
-											   current_pathkeys))
-					{
-						/* we do indeed need to sort */
-						result_plan = (Plan *) sort_plan;
-						current_pathkeys = window_pathkeys;
-					}
-					/* In either case, extract the per-column information */
-					get_column_info_for_window(root, wc, tlist,
-											   sort_plan->numCols,
-											   sort_plan->sortColIdx,
-											   &partNumCols,
-											   &partColIdx,
-											   &partOperators,
-											   &ordNumCols,
-											   &ordColIdx,
-											   &ordOperators);
-				}
-				else
+				if (!is_projection_capable_plan(result_plan))
 				{
-					/* empty window specification, nothing to sort */
-					partNumCols = 0;
-					partColIdx = NULL;
-					partOperators = NULL;
-					ordNumCols = 0;
-					ordColIdx = NULL;
-					ordOperators = NULL;
+					result_plan = (Plan *) make_result(root,
+													   NIL,
+													   NULL,
+													   result_plan);
 				}
 
-				if (lnext(l))
-				{
-					/* Add the current WindowFuncs to the running tlist */
-					window_tlist = add_to_flat_tlist(window_tlist,
-										   wflists->windowFuncs[wc->winref]);
-				}
-				else
+				/*
+				 * The "base" targetlist for all steps of the windowing process is
+				 * a flat tlist of all Vars and Aggs needed in the result.  (In
+				 * some cases we wouldn't need to propagate all of these all the
+				 * way to the top, since they might only be needed as inputs to
+				 * WindowFuncs.  It's probably not worth trying to optimize that
+				 * though.)  We also add window partitioning and sorting
+				 * expressions to the base tlist, to ensure they're computed only
+				 * once at the bottom of the stack (that's critical for volatile
+				 * functions).  As we climb up the stack, we'll add outputs for
+				 * the WindowFuncs computed at each level.
+				 */
+				window_tlist = make_windowInputTargetList(root,
+														  tlist,
+														  activeWindows);
+
+				/*
+				 * The copyObject steps here are needed to ensure that each plan
+				 * node has a separately modifiable tlist.  (XXX wouldn't a
+				 * shallow list copy do for that?)
+				 */
+				result_plan->targetlist = (List *) copyObject(window_tlist);
+
+				foreach(l, activeWindows)
 				{
-					/* Install the original tlist in the topmost WindowAgg */
-					window_tlist = tlist;
-				}
+					WindowClause *wc = (WindowClause *) lfirst(l);
+					List	   *window_pathkeys;
+					int			partNumCols;
+					AttrNumber *partColIdx;
+					Oid		   *partOperators;
+					int			ordNumCols;
+					AttrNumber *ordColIdx;
+					Oid		   *ordOperators;
+
+					window_pathkeys = make_pathkeys_for_window(root,
+															   wc,
+															   tlist);
+
+					/*
+					 * This is a bit tricky: we build a sort node even if we don't
+					 * really have to sort.  Even when no explicit sort is needed,
+					 * we need to have suitable resjunk items added to the input
+					 * plan's tlist for any partitioning or ordering columns that
+					 * aren't plain Vars.  (In theory, make_windowInputTargetList
+					 * should have provided all such columns, but let's not assume
+					 * that here.)	Furthermore, this way we can use existing
+					 * infrastructure to identify which input columns are the
+					 * interesting ones.
+					 */
+					if (window_pathkeys)
+					{
+						Sort	   *sort_plan;
+
+						sort_plan = make_sort_from_pathkeys(root,
+															result_plan,
+															window_pathkeys,
+															-1.0);
+						if (!pathkeys_contained_in(window_pathkeys,
+												   current_pathkeys))
+						{
+							/* we do indeed need to sort */
+							result_plan = (Plan *) sort_plan;
+							current_pathkeys = window_pathkeys;
+						}
+						/* In either case, extract the per-column information */
+						get_column_info_for_window(root, wc, tlist,
+												   sort_plan->numCols,
+												   sort_plan->sortColIdx,
+												   &partNumCols,
+												   &partColIdx,
+												   &partOperators,
+												   &ordNumCols,
+												   &ordColIdx,
+												   &ordOperators);
+					}
+					else
+					{
+						/* empty window specification, nothing to sort */
+						partNumCols = 0;
+						partColIdx = NULL;
+						partOperators = NULL;
+						ordNumCols = 0;
+						ordColIdx = NULL;
+						ordOperators = NULL;
+					}
 
-				/* ... and make the WindowAgg plan node */
-				result_plan = (Plan *)
-					make_windowagg(root,
-								   (List *) copyObject(window_tlist),
-								   wflists->windowFuncs[wc->winref],
-								   wc->winref,
-								   partNumCols,
-								   partColIdx,
-								   partOperators,
-								   ordNumCols,
-								   ordColIdx,
-								   ordOperators,
-								   wc->frameOptions,
-								   wc->startOffset,
-								   wc->endOffset,
-								   result_plan);
+					if (lnext(l))
+					{
+						/* Add the current WindowFuncs to the running tlist */
+						window_tlist = add_to_flat_tlist(window_tlist,
+											   wflists->windowFuncs[wc->winref]);
+					}
+					else
+					{
+						/* Install the original tlist in the topmost WindowAgg */
+						window_tlist = tlist;
+					}
+
+					/* ... and make the WindowAgg plan node */
+					result_plan = (Plan *)
+						make_windowagg(root,
+									   (List *) copyObject(window_tlist),
+									   wflists->windowFuncs[wc->winref],
+									   wc->winref,
+									   partNumCols,
+									   partColIdx,
+									   partOperators,
+									   ordNumCols,
+									   ordColIdx,
+									   ordOperators,
+									   wc->frameOptions,
+									   wc->startOffset,
+									   wc->endOffset,
+									   result_plan);
+				}
 			}
-		}
+
+			result_plan_list = lappend(result_plan_list, result_plan);
+		}						 /* foreach final_rel_list */
 	}							/* end of if (setOperations) */
 
-	/*
-	 * If there is a DISTINCT clause, add the necessary node(s).
-	 */
-	if (parse->distinctClause)
+	foreach(lc, result_plan_list)
 	{
-		double		dNumDistinctRows;
-		long		numDistinctRows;
+		result_plan = (Plan *) lfirst(lc);
 
 		/*
-		 * If there was grouping or aggregation, use the current number of
-		 * rows as the estimated number of DISTINCT rows (ie, assume the
-		 * result was already mostly unique).  If not, use the number of
-		 * distinct-groups calculated previously.
+		 * If there is a DISTINCT clause, add the necessary node(s).
 		 */
-		if (parse->groupClause || root->hasHavingQual || parse->hasAggs)
-			dNumDistinctRows = result_plan->plan_rows;
-		else
-			dNumDistinctRows = dNumGroups;
-
-		/* Also convert to long int --- but 'ware overflow! */
-		numDistinctRows = (long) Min(dNumDistinctRows, (double) LONG_MAX);
-
-		/* Choose implementation method if we didn't already */
-		if (!tested_hashed_distinct)
+		if (parse->distinctClause)
 		{
-			/*
-			 * At this point, either hashed or sorted grouping will have to
-			 * work from result_plan, so we pass that as both "cheapest" and
-			 * "sorted".
-			 */
-			use_hashed_distinct =
-				choose_hashed_distinct(root,
-									   tuple_fraction, limit_tuples,
-									   result_plan->plan_rows,
-									   result_plan->plan_width,
-									   result_plan->startup_cost,
-									   result_plan->total_cost,
-									   result_plan->startup_cost,
-									   result_plan->total_cost,
-									   current_pathkeys,
-									   dNumDistinctRows);
-		}
+			double		dNumDistinctRows;
+			long		numDistinctRows;
 
-		if (use_hashed_distinct)
-		{
-			/* Hashed aggregate plan --- no sort needed */
-			result_plan = (Plan *) make_agg(root,
-											result_plan->targetlist,
-											NIL,
-											AGG_HASHED,
-											NULL,
-										  list_length(parse->distinctClause),
-								 extract_grouping_cols(parse->distinctClause,
-													result_plan->targetlist),
-								 extract_grouping_ops(parse->distinctClause),
-											numDistinctRows,
-											result_plan);
-			/* Hashed aggregation produces randomly-ordered results */
-			current_pathkeys = NIL;
-		}
-		else
-		{
 			/*
-			 * Use a Unique node to implement DISTINCT.  Add an explicit sort
-			 * if we couldn't make the path come out the way the Unique node
-			 * needs it.  If we do have to sort, always sort by the more
-			 * rigorous of DISTINCT and ORDER BY, to avoid a second sort
-			 * below.  However, for regular DISTINCT, don't sort now if we
-			 * don't have to --- sorting afterwards will likely be cheaper,
-			 * and also has the possibility of optimizing via LIMIT.  But for
-			 * DISTINCT ON, we *must* force the final sort now, else it won't
-			 * have the desired behavior.
+			 * If there was grouping or aggregation, use the current number of
+			 * rows as the estimated number of DISTINCT rows (ie, assume the
+			 * result was already mostly unique).  If not, use the number of
+			 * distinct-groups calculated previously.
 			 */
-			List	   *needed_pathkeys;
-
-			if (parse->hasDistinctOn &&
-				list_length(root->distinct_pathkeys) <
-				list_length(root->sort_pathkeys))
-				needed_pathkeys = root->sort_pathkeys;
+			if (parse->groupClause || root->hasHavingQual || parse->hasAggs)
+				dNumDistinctRows = result_plan->plan_rows;
 			else
-				needed_pathkeys = root->distinct_pathkeys;
+				dNumDistinctRows = dNumGroups;
+
+			/* Also convert to long int --- but 'ware overflow! */
+			numDistinctRows = (long) Min(dNumDistinctRows, (double) LONG_MAX);
+
+			/* Choose implementation method if we didn't already */
+			if (!tested_hashed_distinct)
+			{
+				/*
+				 * At this point, either hashed or sorted grouping will have to
+				 * work from result_plan, so we pass that as both "cheapest" and
+				 * "sorted".
+				 */
+				use_hashed_distinct =
+					choose_hashed_distinct(root,
+										   tuple_fraction, limit_tuples,
+										   result_plan->plan_rows,
+										   result_plan->plan_width,
+										   result_plan->startup_cost,
+										   result_plan->total_cost,
+										   result_plan->startup_cost,
+										   result_plan->total_cost,
+										   current_pathkeys,
+										   dNumDistinctRows);
+			}
 
-			if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys))
+			if (use_hashed_distinct)
+			{
+				/* Hashed aggregate plan --- no sort needed */
+				result_plan = (Plan *) make_agg(root,
+												result_plan->targetlist,
+												NIL,
+												AGG_HASHED,
+												NULL,
+											  list_length(parse->distinctClause),
+									 extract_grouping_cols(parse->distinctClause,
+														result_plan->targetlist),
+									 extract_grouping_ops(parse->distinctClause),
+												numDistinctRows,
+												result_plan);
+				/* Hashed aggregation produces randomly-ordered results */
+				current_pathkeys = NIL;
+			}
+			else
 			{
-				if (list_length(root->distinct_pathkeys) >=
+				/*
+				 * Use a Unique node to implement DISTINCT.  Add an explicit sort
+				 * if we couldn't make the path come out the way the Unique node
+				 * needs it.  If we do have to sort, always sort by the more
+				 * rigorous of DISTINCT and ORDER BY, to avoid a second sort
+				 * below.  However, for regular DISTINCT, don't sort now if we
+				 * don't have to --- sorting afterwards will likely be cheaper,
+				 * and also has the possibility of optimizing via LIMIT.  But for
+				 * DISTINCT ON, we *must* force the final sort now, else it won't
+				 * have the desired behavior.
+				 */
+				List	   *needed_pathkeys;
+
+				if (parse->hasDistinctOn &&
+					list_length(root->distinct_pathkeys) <
 					list_length(root->sort_pathkeys))
-					current_pathkeys = root->distinct_pathkeys;
+					needed_pathkeys = root->sort_pathkeys;
 				else
+					needed_pathkeys = root->distinct_pathkeys;
+
+				if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys))
 				{
-					current_pathkeys = root->sort_pathkeys;
-					/* Assert checks that parser didn't mess up... */
-					Assert(pathkeys_contained_in(root->distinct_pathkeys,
-												 current_pathkeys));
+					if (list_length(root->distinct_pathkeys) >=
+						list_length(root->sort_pathkeys))
+						current_pathkeys = root->distinct_pathkeys;
+					else
+					{
+						current_pathkeys = root->sort_pathkeys;
+						/* Assert checks that parser didn't mess up... */
+						Assert(pathkeys_contained_in(root->distinct_pathkeys,
+													 current_pathkeys));
+					}
+
+					result_plan = (Plan *) make_sort_from_pathkeys(root,
+																   result_plan,
+																current_pathkeys,
+																   -1.0);
 				}
 
+				result_plan = (Plan *) make_unique(result_plan,
+												   parse->distinctClause);
+				result_plan->plan_rows = dNumDistinctRows;
+				/* The Unique node won't change sort ordering */
+			}
+		}
+
+		/*
+		 * If ORDER BY was given and we were not able to make the plan come out in
+		 * the right order, add an explicit sort step.
+		 */
+		if (parse->sortClause)
+		{
+			if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
+			{
 				result_plan = (Plan *) make_sort_from_pathkeys(root,
 															   result_plan,
-															current_pathkeys,
-															   -1.0);
+															 root->sort_pathkeys,
+															   limit_tuples);
+				current_pathkeys = root->sort_pathkeys;
 			}
-
-			result_plan = (Plan *) make_unique(result_plan,
-											   parse->distinctClause);
-			result_plan->plan_rows = dNumDistinctRows;
-			/* The Unique node won't change sort ordering */
 		}
-	}
 
-	/*
-	 * If ORDER BY was given and we were not able to make the plan come out in
-	 * the right order, add an explicit sort step.
-	 */
-	if (parse->sortClause)
-	{
-		if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
+		/*
+		 * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
+		 * (Note: we intentionally test parse->rowMarks not root->rowMarks here.
+		 * If there are only non-locking rowmarks, they should be handled by the
+		 * ModifyTable node instead.)
+		 */
+		if (parse->rowMarks)
 		{
-			result_plan = (Plan *) make_sort_from_pathkeys(root,
-														   result_plan,
-														 root->sort_pathkeys,
-														   limit_tuples);
-			current_pathkeys = root->sort_pathkeys;
-		}
-	}
+			result_plan = (Plan *) make_lockrows(result_plan,
+												 root->rowMarks,
+												 SS_assign_special_param(root));
 
-	/*
-	 * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
-	 * (Note: we intentionally test parse->rowMarks not root->rowMarks here.
-	 * If there are only non-locking rowmarks, they should be handled by the
-	 * ModifyTable node instead.)
-	 */
-	if (parse->rowMarks)
-	{
-		result_plan = (Plan *) make_lockrows(result_plan,
-											 root->rowMarks,
-											 SS_assign_special_param(root));
+			/*
+			 * The result can no longer be assumed sorted, since locking might
+			 * cause the sort key columns to be replaced with new values.
+			 */
+			current_pathkeys = NIL;
+		}
 
 		/*
-		 * The result can no longer be assumed sorted, since locking might
-		 * cause the sort key columns to be replaced with new values.
+		 * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node.
 		 */
-		current_pathkeys = NIL;
-	}
+		if (limit_needed(parse))
+		{
+			result_plan = (Plan *) make_limit(result_plan,
+											  parse->limitOffset,
+											  parse->limitCount,
+											  offset_est,
+											  count_est);
+		}
 
-	/*
-	 * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node.
-	 */
-	if (limit_needed(parse))
-	{
-		result_plan = (Plan *) make_limit(result_plan,
-										  parse->limitOffset,
-										  parse->limitCount,
-										  offset_est,
-										  count_est);
-	}
+		lfirst(lc) = result_plan;
+	} /* foreach all_plans */
 
 	/*
 	 * Return the actual output ordering in query_pathkeys for possible use by
@@ -1999,7 +2019,16 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	 */
 	root->query_pathkeys = current_pathkeys;
 
-	return result_plan;
+	/* if there is only one plan, then just return that plan */
+	if (list_length(result_plan_list) == 1)
+		return (Plan *) linitial(result_plan_list);
+
+	/*
+	 * Otherwise we'd better add an AlternativePlan node to allow the executor
+	 * to decide which plan to use.
+	 */
+	else
+		return (Plan *) make_alternativeplan(result_plan_list);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 7703946..c0b7a34 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -435,6 +435,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 	 */
 	switch (nodeTag(plan))
 	{
+		case T_AlternativePlan:
+			{
+				AlternativePlan *aplan = (AlternativePlan *) plan;
+				ListCell *lc;
+				foreach(lc, aplan->planList)
+				{
+					Plan *plan = (Plan *) lfirst(lc);
+					set_plan_refs(root, plan, rtoffset);
+				}
+			}
+			break;
 		case T_SeqScan:
 			{
 				SeqScan    *splan = (SeqScan *) plan;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 265c865..855bc96 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -25,7 +25,9 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_constraint.h"
 #include "catalog/heap.h"
+#include "catalog/pg_type.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -38,6 +40,7 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
@@ -89,6 +92,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *fkinfos = NIL;
+	Relation	fkeyRel;
+	Relation	fkeyRelIdx;
+	ScanKeyData fkeyScankey;
+	SysScanDesc fkeyScan;
+	HeapTuple	tuple;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -384,6 +393,111 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	heap_close(relation, NoLock);
 
+	/* load foreign key constraints */
+	ScanKeyInit(&fkeyScankey,
+				Anum_pg_constraint_conrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relationObjectId));
+
+	fkeyRel = heap_open(ConstraintRelationId, AccessShareLock);
+	fkeyRelIdx = index_open(ConstraintRelidIndexId, AccessShareLock);
+	fkeyScan = systable_beginscan_ordered(fkeyRel, fkeyRelIdx, NULL, 1, &fkeyScankey);
+
+	while ((tuple = systable_getnext_ordered(fkeyScan, ForwardScanDirection)) != NULL)
+	{
+		Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tuple);
+		ForeignKeyInfo *fkinfo;
+		Datum		adatum;
+		bool		isNull;
+		ArrayType  *arr;
+		int			nelements;
+
+		/* skip if not a foreign key */
+		if (con->contype != CONSTRAINT_FOREIGN)
+			continue;
+
+		/* we're not interested unless the fkey has been validated */
+		if (!con->convalidated)
+			continue;
+
+		fkinfo = (ForeignKeyInfo *) palloc(sizeof(ForeignKeyInfo));
+		fkinfo->conindid = con->conindid;
+		fkinfo->confrelid = con->confrelid;
+		fkinfo->convalidated = con->convalidated;
+		fkinfo->conrelid = con->conrelid;
+		fkinfo->confupdtype = con->confupdtype;
+		fkinfo->confdeltype = con->confdeltype;
+		fkinfo->confmatchtype = con->confmatchtype;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "conkey is not a 1-D smallint array");
+
+		fkinfo->conkey = (int16 *) ARR_DATA_PTR(arr);
+		fkinfo->conncols = nelements;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_confkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null confkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "confkey is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of confkey elements does not equal conkey elements");
+
+		fkinfo->confkey = (int16 *) ARR_DATA_PTR(arr);
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conpfeqop,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conpfeqop for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != OIDOID)
+			elog(ERROR, "conpfeqop is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of conpfeqop elements does not equal conkey elements");
+
+		fkinfo->conpfeqop = (Oid *) ARR_DATA_PTR(arr);
+
+		fkinfos = lappend(fkinfos, fkinfo);
+	}
+
+	rel->fklist = fkinfos;
+	systable_endscan_ordered(fkeyScan);
+	index_close(fkeyRelIdx, AccessShareLock);
+	heap_close(fkeyRel, AccessShareLock);
+
 	/*
 	 * Allow a plugin to editorialize on the info we obtained from the
 	 * catalogs.  Actions might include altering the assumed relation size,
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..0be29e6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -115,6 +115,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->lateral_relids = NULL;
 	rel->lateral_referencers = NULL;
 	rel->indexlist = NIL;
+	rel->fklist = NIL;
 	rel->pages = 0;
 	rel->tuples = 0;
 	rel->allvisfrac = 0;
@@ -127,6 +128,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->baserestrictcost.startup = 0;
 	rel->baserestrictcost.per_tuple = 0;
 	rel->joininfo = NIL;
+	rel->removal_flags = PLAN_SUITABILITY_ALL_PURPOSE;
 	rel->has_eclass_joins = false;
 
 	/* Check type of rtable entry */
@@ -377,6 +379,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->lateral_relids = NULL;
 	joinrel->lateral_referencers = NULL;
 	joinrel->indexlist = NIL;
+	joinrel->fklist = NIL;
 	joinrel->pages = 0;
 	joinrel->tuples = 0;
 	joinrel->allvisfrac = 0;
@@ -389,6 +392,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->baserestrictcost.startup = 0;
 	joinrel->baserestrictcost.per_tuple = 0;
 	joinrel->joininfo = NIL;
+	joinrel->removal_flags = PLAN_SUITABILITY_ALL_PURPOSE;
 	joinrel->has_eclass_joins = false;
 
 	/*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 818c2f6..115e398 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -916,6 +916,33 @@ get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 	ReleaseSysCache(tp);
 }
 
+/*
+ * get_attnotnull
+ *
+ *		Given the relation id and the attribute number,
+ *		return the "attnotnull" field from the attribute relation.
+ */
+bool
+get_attnotnull(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache2(ATTNUM,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+		bool		result;
+
+		result = att_tup->attnotnull;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return false;
+}
+
 /*				---------- COLLATION CACHE ----------					 */
 
 /*
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index d0c0dcc..2ae3ea0 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -181,6 +181,7 @@ extern void ExecBSTruncateTriggers(EState *estate,
 extern void ExecASTruncateTriggers(EState *estate,
 					   ResultRelInfo *relinfo);
 
+extern bool AfterTriggerQueueIsEmpty(void);
 extern void AfterTriggerBeginXact(void);
 extern void AfterTriggerBeginQuery(void);
 extern void AfterTriggerEndQuery(EState *estate);
diff --git a/src/include/executor/nodeAlternativePlan.h b/src/include/executor/nodeAlternativePlan.h
new file mode 100644
index 0000000..092f4ef
--- /dev/null
+++ b/src/include/executor/nodeAlternativePlan.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeAppend.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeAlternativePlan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEALTERNATIVEPLAN_H
+#define NODEALTERNATIVEPLAN_H
+
+#include "nodes/execnodes.h"
+
+extern PlanState *ExecInitAlternativePlan(AlternativePlan *node,
+						EState *estate, int eflags);
+/*
+ * Note that this node is only ever seen during initialization of a plan and
+ * it has no state type.
+ */
+#endif   /* NODEALTERNATIVEPLAN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 97ef0fc..668d426 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -77,6 +77,7 @@ typedef enum NodeTag
 	T_SetOp,
 	T_LockRows,
 	T_Limit,
+	T_AlternativePlan,
 	/* these aren't subclasses of Plan: */
 	T_NestLoopParam,
 	T_PlanRowMark,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..3018256 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -734,6 +734,10 @@ typedef enum RTEKind
 	RTE_CTE						/* common table expr (WITH list element) */
 } RTEKind;
 
+/* Bit flags to mark suitability of plans */
+#define PLAN_SUITABILITY_ALL_PURPOSE		0
+#define PLAN_SUITABILITY_FK_TRIGGER_EMPTY	1
+
 typedef struct RangeTblEntry
 {
 	NodeTag		type;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 316c9ce..a3d3127 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,8 +70,11 @@ typedef struct PlannedStmt
 
 	int			nParamExec;		/* number of PARAM_EXEC Params used */
 
+	int			suitableFor; /* under which conditions can this plan be used */
+
 	bool		hasRowSecurity;	/* row security applied? */
 
+
 } PlannedStmt;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -767,6 +770,20 @@ typedef struct LockRows
 	int			epqParam;		/* ID of Param for EvalPlanQual re-eval */
 } LockRows;
 
+
+/* ----------------
+ *		alternative plan node
+ *
+ * Stores a list of alternative plans and one
+ * all purpose plan.
+ * ----------------
+ */
+typedef struct AlternativePlan
+{
+	Plan		plan;
+	List	   *planList;
+} AlternativePlan;
+
 /* ----------------
  *		limit node
  *
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6845a40..d94339f 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -95,6 +95,8 @@ typedef struct PlannerGlobal
 
 	int			nParamExec;		/* number of PARAM_EXEC Params used */
 
+	int			suitableFor; /* under which conditions can this plan be used */
+
 	Index		lastPHId;		/* highest PlaceHolderVar ID assigned */
 
 	Index		lastRowMarkId;	/* highest PlanRowMark ID assigned */
@@ -103,6 +105,7 @@ typedef struct PlannerGlobal
 
 	bool		hasRowSecurity;	/* row security applied? */
 
+
 } PlannerGlobal;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -359,6 +362,8 @@ typedef struct PlannerInfo
  *		lateral_referencers - relids of rels that reference this one laterally
  *		indexlist - list of IndexOptInfo nodes for relation's indexes
  *					(always NIL if it's not a table)
+ *		fklist - list of ForeignKeyInfo's for relation's foreign key
+ *					constraints. (always NIL if it's not a table)
  *		pages - number of disk pages in relation (zero if not a table)
  *		tuples - number of tuples in relation (not considering restrictions)
  *		allvisfrac - fraction of disk pages that are marked all-visible
@@ -452,6 +457,7 @@ typedef struct RelOptInfo
 	Relids		lateral_relids; /* minimum parameterization of rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *fklist;			/* list of ForeignKeyInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -469,6 +475,8 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;		/* cost of evaluating the above */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
+	int			removal_flags;		/* it may be possible to not bother joining
+									 * this relation at all */
 	bool		has_eclass_joins;		/* T means joininfo is incomplete */
 } RelOptInfo;
 
@@ -542,6 +550,51 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * ForeignKeyInfo
+ *		Used to store pg_constraint records for foreign key constraints for use
+ *		by the planner.
+ *
+ *		conindid - The index which supports the foreign key
+ *
+ *		confrelid - The relation that is referenced by this foreign key
+ *
+ *		convalidated - True if the foreign key has been validated.
+ *
+ *		conrelid - The Oid of the relation that the foreign key belongs to
+ *
+ *		confupdtype - ON UPDATE action for when the referenced table is updated
+ *
+ *		confdeltype - ON DELETE action, controls what to do when a record is
+ *					deleted from the referenced table.
+ *
+ *		confmatchtype - foreign key match type, e.g MATCH FULL, MATCH PARTIAL
+ *
+ *		conncols - Number of columns defined in the foreign key
+ *
+ *		conkey - An array of conncols elements to store the varattno of the
+ *					columns on the referencing side of the foreign key
+ *
+ *		confkey - An array of conncols elements to store the varattno of the
+ *					columns on the referenced side of the foreign key
+ *
+ *		conpfeqop - An array of conncols elements to store the operators for
+ *					PK = FK comparisons
+ */
+typedef struct ForeignKeyInfo
+{
+	Oid			conindid;		/* index supporting this constraint */
+	Oid			confrelid;		/* relation referenced by foreign key */
+	bool		convalidated;	/* constraint has been validated? */
+	Oid			conrelid;		/* relation this constraint constrains */
+	char		confupdtype;	/* foreign key's ON UPDATE action */
+	char		confdeltype;	/* foreign key's ON DELETE action */
+	char		confmatchtype;	/* foreign key's match type */
+	int			conncols;		/* number of columns references */
+	int16	   *conkey;			/* Columns of conrelid that the constraint applies to */
+	int16	   *confkey;		/* columns of confrelid that foreign key references */
+	Oid		   *conpfeqop;		/* Operator list for comparing PK to FK */
+} ForeignKeyInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..7b040fa 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -37,7 +37,8 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
 
-extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
+extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist,
+								int removal_flags);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
@@ -119,6 +120,8 @@ extern List *generate_join_implied_equalities(PlannerInfo *root,
 								 Relids join_relids,
 								 Relids outer_relids,
 								 RelOptInfo *inner_rel);
+extern Oid select_equality_operator(EquivalenceClass *ec, Oid lefttype,
+								 Oid righttype);
 extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
 extern void add_child_rel_equivalences(PlannerInfo *root,
 						   AppendRelInfo *appinfo,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 082f7d7..7bcd93a 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -27,8 +27,9 @@ typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
 /*
  * prototypes for plan/planmain.c
  */
-extern RelOptInfo *query_planner(PlannerInfo *root, List *tlist,
-			  query_pathkeys_callback qp_callback, void *qp_extra);
+extern List *query_planner(PlannerInfo *root, List *tlist,
+			  query_pathkeys_callback qp_callback, void *qp_extra,
+			  bool all_purpose_plan_only);
 
 /*
  * prototypes for plan/planagg.c
@@ -73,6 +74,7 @@ extern Group *make_group(PlannerInfo *root, List *tlist, List *qual,
 extern Plan *materialize_finished_plan(Plan *subplan);
 extern Unique *make_unique(Plan *lefttree, List *distinctList);
 extern LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int epqParam);
+extern AlternativePlan *make_alternativeplan(List *planlist);
 extern Limit *make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount,
 		   int64 offset_est, int64 count_est);
 extern SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan *lefttree,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 2f5ede1..14e64fc 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -68,6 +68,7 @@ extern Oid	get_atttype(Oid relid, AttrNumber attnum);
 extern int32 get_atttypmod(Oid relid, AttrNumber attnum);
 extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 					  Oid *typid, int32 *typmod, Oid *collid);
+extern bool get_attnotnull(Oid relid, AttrNumber attnum);
 extern char *get_collation_name(Oid colloid);
 extern char *get_constraint_name(Oid conoid);
 extern Oid	get_opclass_family(Oid opclass);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 2501184..e485554 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3276,6 +3276,171 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 (1 row)
 
 rollback;
+begin work;
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+ id | b_id 
+----+------
+  3 |    2
+(1 row)
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (val = 10)
+(6 rows)
+
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (id = val)
+(6 rows)
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+         QUERY PLAN         
+----------------------------
+ Hash Join
+   Hash Cond: (a.id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+          QUERY PLAN          
+------------------------------
+ Hash Right Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+        QUERY PLAN         
+---------------------------
+ Nested Loop
+   ->  Seq Scan on c
+   ->  Materialize
+         ->  Seq Scan on a
+(4 rows)
+
+set constraints b_c_id_fkey deferred;
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+explain (costs off)
+execute ab;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+explain (costs off)
+execute ab;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+rollback;
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 718e1d9..c3ee72e 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -977,6 +977,89 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 
 rollback;
 
+begin work;
+
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+
+set constraints b_c_id_fkey deferred;
+
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+rollback;
+
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);

#45

Robert Haas

robertmhaas@gmail.com

about 11 years ago

In reply to: David Rowley (#44)

Re: Removing INNER JOINs

On Thu, Jan 8, 2015 at 6:31 AM, David Rowley <dgrowleyml@gmail.com> wrote:

I'd be keen to know what people's thoughts are about the nodeAlternativePlan
only surviving until the plan is initialised.

I find it scary, although sometimes I am easily frightened.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Tom Lane (#38)

Re: Removing INNER JOINs

On 12/3/14 1:08 PM, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

Do you need to plan for every combination, where some joins are removed
and some are not?

I would vote for just having two plans and one switch node. To exploit
any finer grain, we'd have to have infrastructure that would let us figure
out*which* constraints pending triggers might indicate transient
invalidity of, and that doesn't seem likely to be worth the trouble.

In the interest of keeping the first pass simple... what if there was simply a switch node in front of every join that could be removable? That means you'd still be stuck with the overall plan you got from not removing anything, but simply eliminating the access to the relation(s) might be a big win in many cases, and presumably this would be a lot easier to code.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

David Rowley

dgrowleyml@gmail.com

almost 11 years ago

In reply to: Jim Nasby (#46)

Re: Removing INNER JOINs

On 13 January 2015 at 11:29, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 12/3/14 1:08 PM, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

Do you need to plan for every combination, where some joins are removed
and some are not?

I would vote for just having two plans and one switch node. To exploit
any finer grain, we'd have to have infrastructure that would let us figure
out*which* constraints pending triggers might indicate transient
invalidity of, and that doesn't seem likely to be worth the trouble.

In the interest of keeping the first pass simple... what if there was
simply a switch node in front of every join that could be removable? That
means you'd still be stuck with the overall plan you got from not removing
anything, but simply eliminating the access to the relation(s) might be a
big win in many cases, and presumably this would be a lot easier to code.

I can't quite get my head around what you mean here, as the idea sounds
quite similar to something that's been discussed already and ruled out.
If we're joining relation a to relation b, say the plan chosen is a merge
join. If we put some special node as the parent of the merge join then how
will we know to skip or not skip any sorts that are there especially for
the merge join, or perhaps the planner choose an index scan as a sorted
path, where now that the join is removed, could become a faster seqscan.
The whole plan switching node discussion came from this exact problem.
Nobody seemed to like the non-optimal plan that was not properly optimised
for having the relation removed.

It also seems that transitioning through needless nodes comes at a cost.
This is why I quite liked the Alternative Plan node idea, as it allowed me
to skip over the alternative plan node at plan initialisation.

Regards

David Rowley

#48

David Rowley

dgrowleyml@gmail.com

almost 11 years ago

In reply to: Robert Haas (#45)

Re: Removing INNER JOINs

On 12 January 2015 at 15:57, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 8, 2015 at 6:31 AM, David Rowley <dgrowleyml@gmail.com> wrote:

I'd be keen to know what people's thoughts are about the

nodeAlternativePlan

only surviving until the plan is initialised.

I find it scary, although sometimes I am easily frightened.

Ok remember I'm not actually modifying the plan like I was in the earlier
version of the patch. The Alternative Plan node just simply initialises the
correct plan and instead of returning it's own initialised state, it
returns the initialised state of the selected plan's root node.

I have to admit, it didn't really become clear in my head if the frenzy of
discussion above gave any sort of indication that this "Alternative plan
node" would remain and be shown in the EXPLAIN output, or the appropriate
plan would be selected during plan initialisation and the new root node
would become that of the selected plan. When I was implement what was
discussed, I decided that it would be better to choose the correct plan
during initialisation rather than transitioning through the "Alternative
plan node" for each row. As proved already on this thread, transitioning
through needless nodes adds needless executor time overhead.

Also if we kept the alternative plan node, then I think the explain would
look rather weird and frighteningly complex, as it would effectively be 2
plans in 1.

I'm actually quite happy with how simple the executor changes became. It's
far more simple and clean than the node stripping code that I had in an
earlier version. The parts of the patch that I'm concerned might raise a
few eyebrows are the changes to query_planner() as it now returns a List of
RelOptInfo instead of a RelOptInfo.

Regards

David Rowley

#49

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: David Rowley (#47)

Re: Removing INNER JOINs

On 1/13/15 5:02 AM, David Rowley wrote:

I can't quite get my head around what you mean here, as the idea sounds quite similar to something that's been discussed already and ruled out.
If we're joining relation a to relation b, say the plan chosen is a merge join. If we put some special node as the parent of the merge join then how will we know to skip or not skip any sorts that are there especially for the merge join, or perhaps the planner choose an index scan as a sorted path, where now that the join is removed, could become a faster seqscan. The whole plan switching node discussion came from this exact problem. Nobody seemed to like the non-optimal plan that was not properly optimised for having the relation removed.

In my mind this is the same as a root level Alternative Plan, so you'd be free to do whatever you wanted in the alternate:

-> blah blah
-> Alternate
-> Merge join
...
-> SeqScan

I'm guessing this would be easier to code, but that's just a guess. The other advantage is if you can't eliminate the join to table A at runtime you could still eliminate table B, whereas a top-level Alternate node doesn't have that flexibility.

This does have a disadvantage of creating more plan variations to consider. With a single top-level Alternate node there's only one other option. I believe multiple Alternates would create more options to consider.

Ultimately, unless this is easier to code than a top-level alternate, it's probably not worth pursuing.

It also seems that transitioning through needless nodes comes at a cost. This is why I quite liked the Alternative Plan node idea, as it allowed me to skip over the alternative plan node at plan initialisation.

For init I would expect this to result in a smaller number of nodes than a top-level Alternate, because you wouldn't be duplicating all the stuff above the joins. That said, I rather doubt it's worth worrying about the cost to init; won't it be completely swamped by extra planning cost, no matter how we go about this?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

David Rowley

dgrowleyml@gmail.com

almost 11 years ago

In reply to: Jim Nasby (#49)

1 attachment(s)

Re: Removing INNER JOINs

On 15 January 2015 at 08:36, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 1/13/15 5:02 AM, David Rowley wrote:

I can't quite get my head around what you mean here, as the idea sounds
quite similar to something that's been discussed already and ruled out.
If we're joining relation a to relation b, say the plan chosen is a merge
join. If we put some special node as the parent of the merge join then how
will we know to skip or not skip any sorts that are there especially for
the merge join, or perhaps the planner choose an index scan as a sorted
path, where now that the join is removed, could become a faster seqscan.
The whole plan switching node discussion came from this exact problem.
Nobody seemed to like the non-optimal plan that was not properly optimised
for having the relation removed.

In my mind this is the same as a root level Alternative Plan, so you'd be
free to do whatever you wanted in the alternate:

-> blah blah
-> Alternate
-> Merge join
...
-> SeqScan

I'm guessing this would be easier to code, but that's just a guess. The
other advantage is if you can't eliminate the join to table A at runtime
you could still eliminate table B, whereas a top-level Alternate node
doesn't have that flexibility.

This does have a disadvantage of creating more plan variations to
consider. With a single top-level Alternate node there's only one other
option. I believe multiple Alternates would create more options to consider.

Ultimately, unless this is easier to code than a top-level alternate, it's
probably not worth pursuing.

I think it's probably possible to do this, but I think it would require
calling make_one_rel() with every combination of each possibly removable
relations included and not included in the join list. I'm thinking this
could end up a lot of work as the number of calls to make_one_rel() would
be N^2, where N is the number of relations that may be removable.

My line of thought was more along the lines of that the backup/all purpose
plan will only be used in very rare cases. Either when a fk has been
deferred or if the query is being executed from within a volatile function
which has been called by an UPDATE statement which has just modified the
table causing a foreign key trigger to be queued. I'm willing to bet
someone does this somewhere in the world, but the query that's run would
also have to have a removable join. (One of the regression tests I've added
exercises this)

For that reason I thought it was best to generate only 2 plans. One with
*all* possible removable rels removed, and a backup one with nothing
removed which will be executed if there's any FK triggers queued up.

It also seems that transitioning through needless nodes comes at a cost.

This is why I quite liked the Alternative Plan node idea, as it allowed me
to skip over the alternative plan node at plan initialisation.

For init I would expect this to result in a smaller number of nodes than a
top-level Alternate, because you wouldn't be duplicating all the stuff
above the joins. That said, I rather doubt it's worth worrying about the
cost to init; won't it be completely swamped by extra planning cost, no
matter how we go about this?

I'm not worried about the cost of the decision at plan init time. I was
just confused about what Tom was recommending. I couldn't quite decide from
his email if he meant to keep the alternative plan node around so that the
executor must transition through it for each row, or to just choose the
proper plan at executor init and return the actual root of the selected
plan instead of returning the initialised AlternativePlan node (see
nodeAlternativePlan.c)

The two ways of doing this have a massively different look in the EXPLAIN
output. With the method the patch currently implements only 1 of the 2
alternative plans are seen by EXPLAIN, this is because I've coded
ExecInitAlternativePlan() to return the root node only 1 of the 2 plans. If
I had kept the AlternativePlan node around then the EXPLAIN output would
have 2 plans, both sitting under the AlternativePlan node.

I've attached a rebased patch which is based on master as of today.

Any comments/reviews are welcome.

Regards

David Rowley

Attachments:

inner_join_removals_2014-03-16_6c2f36d.patchapplication/octet-stream; name=inner_join_removals_2014-03-16_6c2f36d.patchDownload

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index e491c5b..f55acd2 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3899,6 +3899,17 @@ afterTriggerInvokeEvents(AfterTriggerEventList *events,
 	return all_fired;
 }
 
+/* ----------
+ * AfterTriggerQueueIsEmpty()
+ *
+ *	True if there are no pending triggers in the queue.
+ * ----------
+ */
+bool
+AfterTriggerQueueIsEmpty(void)
+{
+	return (afterTriggers.query_depth == -1 && afterTriggers.events.head == NULL);
+}
 
 /* ----------
  * AfterTriggerBeginXact()
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index af707b0..bfbd5b3 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -14,8 +14,8 @@ include $(top_builddir)/src/Makefile.global
 
 OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \
        execProcnode.o execQual.o execScan.o execTuples.o \
-       execUtils.o functions.o instrument.o nodeAppend.o nodeAgg.o \
-       nodeBitmapAnd.o nodeBitmapOr.o \
+       execUtils.o functions.o instrument.o nodeAlternativePlan.o nodeAppend.o \
+       nodeAgg.o nodeBitmapAnd.o nodeBitmapOr.o \
        nodeBitmapHeapscan.o nodeBitmapIndexscan.o nodeCustom.o nodeHash.o \
        nodeHashjoin.o nodeIndexscan.o nodeIndexonlyscan.o \
        nodeLimit.o nodeLockRows.o \
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index 9892499..523e187 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -79,6 +79,7 @@
 
 #include "executor/executor.h"
 #include "executor/nodeAgg.h"
+#include "executor/nodeAlternativePlan.h"
 #include "executor/nodeAppend.h"
 #include "executor/nodeBitmapAnd.h"
 #include "executor/nodeBitmapHeapscan.h"
@@ -147,6 +148,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
 			/*
 			 * control nodes
 			 */
+		case T_AlternativePlan:
+			result = (PlanState *) ExecInitAlternativePlan((AlternativePlan *)node,
+												  estate, eflags);
+			break;
+
 		case T_Result:
 			result = (PlanState *) ExecInitResult((Result *) node,
 												  estate, eflags);
diff --git a/src/backend/executor/nodeAlternativePlan.c b/src/backend/executor/nodeAlternativePlan.c
new file mode 100644
index 0000000..cafe33a
--- /dev/null
+++ b/src/backend/executor/nodeAlternativePlan.c
@@ -0,0 +1,51 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeAlternativePlan.c
+ *	  Node to support storage of alternative plans.
+ *
+ *	  Note that this node is rather special as it only exists while the plan
+ *	  is being initialised.
+ *
+ *	  When the initialization method is called for this node, a decision is
+ *	  made to decide which plan should be initialized, the code here then calls
+ *	  the initialize method on the selected plan and returns the state value
+ *	  from the root node of that plan.
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/executor/nodeAlternativePlan.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/trigger.h"
+
+#include "executor/executor.h"
+#include "executor/nodeAlternativePlan.h"
+
+PlanState *
+ExecInitAlternativePlan(AlternativePlan *node, EState *estate, int eflags)
+{
+	/*
+	 * If we have items in the fk trigger queue, then we'd better use the all
+	 * all purpose plan. Since an AlternativePlan node has no state, we simply
+	 * just initialize the root node of the selected plan. This means that the
+	 * AlternativePlan node is *never* seen in EXPLAIN or EXPLAIN ANALYZE.
+	 */
+	if (!AfterTriggerQueueIsEmpty())
+		return (PlanState *) ExecInitNode((Plan *) list_nth(node->planList, 1),
+											estate, eflags);
+
+	/*
+	 * Otherwise we initialize the root node of the optimized plan and return
+	 * that.
+	 */
+	else
+		return (PlanState *) ExecInitNode((Plan *) linitial(node->planList),
+											estate, eflags);
+}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 3c6a964..f23e761 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -94,6 +94,7 @@ _copyPlannedStmt(const PlannedStmt *from)
 	COPY_NODE_FIELD(invalItems);
 	COPY_SCALAR_FIELD(nParamExec);
 	COPY_SCALAR_FIELD(hasRowSecurity);
+	COPY_SCALAR_FIELD(suitableFor);
 
 	return newnode;
 }
@@ -965,6 +966,16 @@ _copyLimit(const Limit *from)
 	return newnode;
 }
 
+static AlternativePlan *
+_copyAlternativePlan(const AlternativePlan *from)
+{
+	AlternativePlan *newnode = makeNode(AlternativePlan);
+
+	COPY_NODE_FIELD(planList);
+
+	return newnode;
+}
+
 /*
  * _copyNestLoopParam
  */
@@ -4130,6 +4141,9 @@ copyObject(const void *from)
 		case T_Limit:
 			retval = _copyLimit(from);
 			break;
+		case T_AlternativePlan:
+			retval = _copyAlternativePlan(from);
+			break;
 		case T_NestLoopParam:
 			retval = _copyNestLoopParam(from);
 			break;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 385b289..440a7e2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -256,6 +256,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
 	WRITE_NODE_FIELD(invalItems);
 	WRITE_INT_FIELD(nParamExec);
 	WRITE_BOOL_FIELD(hasRowSecurity);
+	WRITE_INT_FIELD(suitableFor);
 }
 
 /*
@@ -1723,6 +1724,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
 	WRITE_UINT_FIELD(lastRowMarkId);
 	WRITE_BOOL_FIELD(transientPlan);
 	WRITE_BOOL_FIELD(hasRowSecurity);
+	WRITE_INT_FIELD(suitableFor);
 }
 
 static void
@@ -1809,6 +1811,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	/* we don't try to print fdwroutine or fdw_private */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_NODE_FIELD(joininfo);
+	WRITE_INT_FIELD(removal_flags);
 	WRITE_BOOL_FIELD(has_eclass_joins);
 }
 
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 58d78e6..69990a2 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,7 +97,8 @@ static void set_cte_pathlist(PlannerInfo *root, RelOptInfo *rel,
 				 RangeTblEntry *rte);
 static void set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					   RangeTblEntry *rte);
-static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
+static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist,
+					   int removal_flags);
 static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
 						  pushdown_safety_info *safetyInfo);
 static bool recurse_pushdown_safe(Node *setOp, Query *topquery,
@@ -122,7 +123,7 @@ static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
  *	  single rel that represents the join of all base rels in the query.
  */
 RelOptInfo *
-make_one_rel(PlannerInfo *root, List *joinlist)
+make_one_rel(PlannerInfo *root, List *joinlist, int removal_flags)
 {
 	RelOptInfo *rel;
 	Index		rti;
@@ -142,7 +143,8 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 		Assert(brel->relid == rti);		/* sanity check on array */
 
 		/* ignore RTEs that are "other rels" */
-		if (brel->reloptkind != RELOPT_BASEREL)
+		if (brel->reloptkind != RELOPT_BASEREL ||
+			brel->removal_flags & removal_flags)
 			continue;
 
 		root->all_baserels = bms_add_member(root->all_baserels, brel->relid);
@@ -157,12 +159,13 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	/*
 	 * Generate access paths for the entire join tree.
 	 */
-	rel = make_rel_from_joinlist(root, joinlist);
+	rel = make_rel_from_joinlist(root, joinlist, removal_flags);
+
 
 	/*
 	 * The result should join all and only the query's base rels.
 	 */
-	Assert(bms_equal(rel->relids, root->all_baserels));
+	Assert(bms_is_subset(root->all_baserels, rel->relids));
 
 	return rel;
 }
@@ -1496,7 +1499,7 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * data structure.
  */
 static RelOptInfo *
-make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
+make_rel_from_joinlist(PlannerInfo *root, List *joinlist, int removal_flags)
 {
 	int			levels_needed;
 	List	   *initial_rels;
@@ -1528,11 +1531,23 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
 			int			varno = ((RangeTblRef *) jlnode)->rtindex;
 
 			thisrel = find_base_rel(root, varno);
+
+			/*
+			 * If this relation can be removed for these removal_flags, then
+			 * we'll not bother including this in the list of relations to join
+			 * to
+			 */
+			if ((thisrel->removal_flags & removal_flags))
+			{
+				/* one less level needed too */
+				levels_needed--;
+				continue;
+			}
 		}
 		else if (IsA(jlnode, List))
 		{
 			/* Recurse to handle subproblem */
-			thisrel = make_rel_from_joinlist(root, (List *) jlnode);
+			thisrel = make_rel_from_joinlist(root, (List *) jlnode, removal_flags);
 		}
 		else
 		{
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index eb65c97..8ddc9db 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -49,8 +49,6 @@ static List *generate_join_implied_equalities_broken(PlannerInfo *root,
 										Relids outer_relids,
 										Relids nominal_inner_relids,
 										RelOptInfo *inner_rel);
-static Oid select_equality_operator(EquivalenceClass *ec,
-						 Oid lefttype, Oid righttype);
 static RestrictInfo *create_join_clause(PlannerInfo *root,
 				   EquivalenceClass *ec, Oid opno,
 				   EquivalenceMember *leftem,
@@ -1282,7 +1280,7 @@ generate_join_implied_equalities_broken(PlannerInfo *root,
  *
  * Returns InvalidOid if no operator can be found for this datatype combination
  */
-static Oid
+Oid
 select_equality_operator(EquivalenceClass *ec, Oid lefttype, Oid righttype)
 {
 	ListCell   *lc;
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 11d3933..f52e1ee 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -32,13 +32,21 @@
 #include "utils/lsyscache.h"
 
 /* local functions */
-static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					  RangeTblRef *removalrtr, Relids ignoredrels);
+static bool leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
+static bool relation_is_needed(PlannerInfo *root, Relids joinrelids,
+					  RelOptInfo *rel, Relids ignoredrels);
+static bool relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+					  RelOptInfo *referencedrel, List *referencing_vars,
+					  List *index_vars, List *operator_list);
+static bool expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					  List *indexvars, List *operators);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
 					  Relids joinrelids);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static Oid	distinct_col_search(int colno, List *colnos, List *opids);
 
-
 /*
  * remove_useless_joins
  *		Check for relations that don't actually need to be joined at all,
@@ -46,26 +54,104 @@ static Oid	distinct_col_search(int colno, List *colnos, List *opids);
  *
  * We are passed the current joinlist and return the updated list.  Other
  * data structures that have to be updated are accessible via "root".
+ *
+ * There are 2 methods here for removing joins. Joins such as LEFT JOINs
+ * which can be proved to be needless due to lack of use of any of the joining
+ * relation's columns and the existence of a unique index on a subset of the
+ * join clause, can simply be removed from the query plan at plan time. For
+ * certain other join types we make use of foreign keys to attempt to prove the
+ * join is needless, though, for these we're unable to be certain that the join
+ * is not required at plan time, as if the plan is executed when pending
+ * foreign key triggers have not yet been fired, then the foreign key is
+ * effectively violated until these triggers have fired. Removing a join in
+ * such a case could cause a query to produce incorrect results.
+ *
+ * Instead we handle this case by marking the RangeTblEntry for the relation
+ * with a special flag which tells the executor that it's possible that joining
+ * to this relation may not be required. The executor may then check this flag
+ * and choose to skip the join based on if there are foreign key triggers
+ * pending or not.
  */
 List *
 remove_useless_joins(PlannerInfo *root, List *joinlist)
 {
 	ListCell   *lc;
+	Relids		removedrels = NULL;
 
 	/*
-	 * We are only interested in relations that are left-joined to, so we can
-	 * scan the join_info_list to find them easily.
+	 * Start by analyzing INNER JOINed relations in order to determine if any
+	 * of the relations can be ignored.
 	 */
 restart:
+	foreach(lc, joinlist)
+	{
+		RangeTblRef		*rtr = (RangeTblRef *) lfirst(lc);
+		RelOptInfo		*rel;
+
+		if (!IsA(rtr, RangeTblRef))
+			continue;
+
+		rel = root->simple_rel_array[rtr->rtindex];
+
+		/* Don't try to remove this one again if we've already removed it */
+		if ((rel->removal_flags & PLAN_SUITABILITY_FK_TRIGGER_EMPTY) != 0)
+			continue;
+
+		/* skip if the join can't be removed */
+		if (!innerjoin_is_removable(root, joinlist, rtr, removedrels))
+			continue;
+
+		/*
+		 * Since we're not actually removing the join here, we need to maintain
+		 * a list of relations that we've "removed" so when we're checking if
+		 * other relations can be removed we'll know that if the to be removed
+		 * relation is only referenced by a relation that we've already removed
+		 * that it can be safely assumed that the relation is not referenced by
+		 * any useful relation.
+		 */
+		removedrels = bms_add_member(removedrels, rtr->rtindex);
+
+		/*
+		 * Mark that this relation is only required when the foreign key trigger
+		 * queue us non-empty.
+		 */
+		rel->removal_flags |= PLAN_SUITABILITY_FK_TRIGGER_EMPTY;
+
+		/*
+		 * Globally mark this plan to say that there are some relations which
+		 * are only required when the foreign key trigger queue is non-empty.
+		 * The planner will later generate 2 plans, 1 which is suitable only
+		 * when all if these bitmask conditions are met, and another which is
+		 * an all purpose plan, which will be used if *any* of the bitmask's
+		 * conditions are not met.
+		 */
+		root->glob->suitableFor |= PLAN_SUITABILITY_FK_TRIGGER_EMPTY;
+
+		/*
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that since we've added
+		 * this relation to the removedrels, we may now realize that other
+		 * relations can also be removed as they're only referenced by the one
+		 * that we've just marked as possibly removable).
+		 */
+		goto restart;
+	}
+
+	/* now process special joins. Currently only left joins are supported */
 	foreach(lc, root->join_info_list)
 	{
 		SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
 		int			innerrelid;
 		int			nremoved;
 
-		/* Skip if not removable */
-		if (!join_is_removable(root, sjinfo))
-			continue;
+		if (sjinfo->jointype == JOIN_LEFT)
+		{
+			/* Skip if not removable */
+			if (!leftjoin_is_removable(root, sjinfo))
+				continue;
+		}
+		else
+			continue; /* we don't support this join type */
 
 		/*
 		 * Currently, join_is_removable can only succeed when the sjinfo's
@@ -91,12 +177,11 @@ restart:
 		root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);
 
 		/*
-		 * Restart the scan.  This is necessary to ensure we find all
-		 * removable joins independently of ordering of the join_info_list
-		 * (note that removal of attr_needed bits may make a join appear
-		 * removable that did not before).  Also, since we just deleted the
-		 * current list cell, we'd have to have some kluge to continue the
-		 * list scan anyway.
+		 * Restart the scan.  This is necessary to ensure we find all removable
+		 * joins independently of their ordering. (note that removal of
+		 * attr_needed bits may make a join, inner or outer, appear removable
+		 * that did not before).   Also, since we just deleted the current list
+		 * cell, we'd have to have some kluge to continue the list scan anyway.
 		 */
 		goto restart;
 	}
@@ -136,8 +221,230 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
 }
 
 /*
- * join_is_removable
- *	  Check whether we need not perform this special join at all, because
+ * innerjoin_is_removable
+ *		True if the join to removalrtr can be removed.
+ *
+ * In order to prove a relation which is inner joined is not required we must
+ * be sure that the join would emit exactly 1 row on the join condition. This
+ * differs from the logic which is used for proving LEFT JOINs can be removed,
+ * where it's possible to just check that a unique index exists on the relation
+ * being removed which has a set of columns that is a subset of the columns
+ * seen in the join condition. If no matching row is found then left join would
+ * not remove the non-matched row from the result set. This is not the case
+ * with INNER JOINs, so here we must use foreign keys as proof that the 1 row
+ * exists before we can allow any joins to be removed. With INNER JOINs we
+ * require that the join condition columns matches exactly to the foreign key,
+ * if there are any extra conditions then we can't ensure that the row exists,
+ * if any are missing, then we can't ensure that the relation will only produce
+ * at most 1 row for each matching on the joining relation.
+ */
+static bool
+innerjoin_is_removable(PlannerInfo *root, List *joinlist,
+					   RangeTblRef *removalrtr, Relids ignoredrels)
+{
+	ListCell   *lc;
+	RelOptInfo *removalrel;
+
+	removalrel = find_base_rel(root, removalrtr->rtindex);
+
+	/*
+	 * As foreign keys may only reference base rels which have unique indexes,
+	 * we needn't go any further if we're not dealing with a base rel, or if
+	 * the base rel has no unique indexes. We'd also better abort if the
+	 * rtekind is anything but a relation, as things like sub-queries may have
+	 * grouping or distinct clauses that would cause us not to be able to use
+	 * the foreign key to prove the existence of a row matching the join
+	 * condition. We also abort if the rel has no eclass joins as such a rel
+	 * could well be joined using some operator which is not an equality
+	 * operator, or the rel may not even be inner joined at all.
+	 *
+	 * Here we actually only check if the rel has any indexes, ideally we'd be
+	 * checking for unique indexes, but we could only determine that by looping
+	 * over the indexlist, and this is likely too expensive a check to be worth
+	 * it here.
+	 */
+	if (removalrel->reloptkind != RELOPT_BASEREL ||
+		removalrel->rtekind != RTE_RELATION ||
+		removalrel->has_eclass_joins == false ||
+		removalrel->indexlist == NIL)
+		return false;
+
+	/*
+	 * Currently we disallow the removal if we find any baserestrictinfo items
+	 * on the relation being removed. The reason for this is that these would
+	 * filter out rows and make it so the foreign key cannot prove that we'll
+	 * match exactly 1 row on the join condition. However, this check is
+	 * currently probably a bit overly strict as it should be possible to just
+	 * check and ensure that each Var seen in the baserestrictinfo is also
+	 * present in an eclass and if so, just translate and move the whole
+	 * baserestrictinfo over to the relation which has the foreign key to prove
+	 * that this join is not needed. e.g:
+	 * SELECT a.* FROM a INNER JOIN b ON a.b_id = b.id WHERE b.id = 1;
+	 * could become: SELECT a.* FROM a WHERE a.b_id = 1;
+	 */
+	if (removalrel->baserestrictinfo != NIL)
+		return false;
+
+	/*
+	 * Currently only eclass joins are supported, so if there are any non
+	 * eclass join quals then we'll report the join is non-removable.
+	 */
+	if (removalrel->joininfo != NIL)
+		return false;
+
+	/*
+	 * Now we'll search through each relation in the joinlist to see if we can
+	 * find a relation which has a foreign key which references removalrel on
+	 * the join condition. If we find a rel with a foreign key which matches
+	 * the join condition exactly, then we can be sure that exactly 1 row will
+	 * be matched on the join, if we also see that no Vars from the relation
+	 * are needed, then we can report the join as removable.
+	 */
+	foreach (lc, joinlist)
+	{
+		RangeTblRef	*rtr = (RangeTblRef *) lfirst(lc);
+		RelOptInfo	*rel;
+		ListCell	*lc2;
+		List		*referencing_vars;
+		List		*index_vars;
+		List		*operator_list;
+		Relids		 joinrelids;
+
+		/* we can't remove ourself, or anything other than RangeTblRefs */
+		if (rtr == removalrtr || !IsA(rtr, RangeTblRef))
+			continue;
+
+		rel = find_base_rel(root, rtr->rtindex);
+
+		/*
+		 * The only relation type that can help us is a base rel with at least
+		 * one foreign key defined, if there's no eclass joins then this rel
+		 * is not going to help us prove the removalrel is not needed.
+		 */
+		if (rel->reloptkind != RELOPT_BASEREL ||
+			rel->rtekind != RTE_RELATION ||
+			rel->has_eclass_joins == false ||
+			rel->fklist == NIL)
+			continue;
+
+		/*
+		 * Both rels have eclass joins, but do they have eclass joins to each
+		 * other? Skip this rel if it does not.
+		 */
+		if (!have_relevant_eclass_joinclause(root, rel, removalrel))
+			continue;
+
+		joinrelids = bms_union(rel->relids, removalrel->relids);
+
+		/* if any of the Vars from the relation are needed then abort */
+		if (relation_is_needed(root, joinrelids, removalrel, ignoredrels))
+			return false;
+
+		referencing_vars = NIL;
+		index_vars = NIL;
+		operator_list = NIL;
+
+		/* now populate the lists with the join condition Vars */
+		foreach(lc2, root->eq_classes)
+		{
+			EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc2);
+
+			if (list_length(ec->ec_members) < 2 || ec->ec_has_volatile)
+				continue;
+
+			if (bms_overlap(removalrel->relids, ec->ec_relids) &&
+				bms_overlap(rel->relids, ec->ec_relids))
+			{
+				ListCell *lc3;
+				Var *refvar = NULL;
+				Var *idxvar = NULL;
+
+				/*
+				 * Look at each member of the eclass and try to find a Var from
+				 * each side of the join that we can append to the list of
+				 * columns that should be checked against each foreign key.
+				 *
+				 * The following logic does not allow for join removals to take
+				 * place for foreign keys that have duplicate columns on the
+				 * referencing side of the foreign key, such as:
+				 * (a,a) references (x,y)
+				 * The use case for such a foreign key is likely small enough
+				 * that we needn't bother making this code anymore complex to
+				 * solve. If we find more than 1 Var from any of the rels then
+				 * we'll bail out.
+				 */
+				foreach (lc3, ec->ec_members)
+				{
+					EquivalenceMember *ecm = (EquivalenceMember *) lfirst(lc3);
+
+					Var *var = (Var *) ecm->em_expr;
+
+					if (!IsA(var, Var))
+						continue; /* Skip non Vars */
+
+					if (var->varno == rel->relid)
+					{
+						if (refvar != NULL)
+							return false;
+						refvar = var;
+					}
+
+					else if (var->varno == removalrel->relid)
+					{
+						if (idxvar != NULL)
+							return false;
+						idxvar = var;
+					}
+				}
+
+				if (refvar != NULL && idxvar != NULL)
+				{
+					Oid opno;
+					Oid reloid = root->simple_rte_array[refvar->varno]->relid;
+
+					/*
+					 * We cannot allow the removal to take place if any of the
+					 * columns in the join condition are nullable. This is due
+					 * to the fact that the join condition would end up
+					 * filtering out NULL values for us, but if we remove the
+					 * join, then there's nothing to stop the NULLs getting
+					 * into the resultset.
+					 */
+					if (!get_attnotnull(reloid, refvar->varattno))
+						return false;
+
+					/* grab the correct equality operator for these two vars */
+					opno = select_equality_operator(ec, refvar->vartype, idxvar->vartype);
+
+					if (!OidIsValid(opno))
+						return false;
+
+					referencing_vars = lappend(referencing_vars, refvar);
+					index_vars = lappend(index_vars, idxvar);
+					operator_list = lappend_oid(operator_list, opno);
+				}
+			}
+		}
+
+		/*
+		 * Did we find any conditions? It's ok that we just check 1 of the 3
+		 * lists to see if it's empty here as these will always contain the
+		 * same number of items
+		 */
+		if (referencing_vars != NIL)
+		{
+			if (relation_has_foreign_key_for(root, rel, removalrel,
+				referencing_vars, index_vars, operator_list))
+				return true; /* removalrel can be removed */
+		}
+	}
+
+	return false; /* can't remove join */
+}
+
+/*
+ * leftjoin_is_removable
+ *	  Check whether we need not perform this left join at all, because
  *	  it will just duplicate its left input.
  *
  * This is true for a left join for which the join condition cannot match
@@ -147,7 +454,7 @@ clause_sides_match_join(RestrictInfo *rinfo, Relids outerrelids,
  * above the join.
  */
 static bool
-join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
+leftjoin_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 {
 	int			innerrelid;
 	RelOptInfo *innerrel;
@@ -155,14 +462,14 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	Relids		joinrelids;
 	List	   *clause_list = NIL;
 	ListCell   *l;
-	int			attroff;
+
+	Assert(sjinfo->jointype == JOIN_LEFT);
 
 	/*
-	 * Must be a non-delaying left join to a single baserel, else we aren't
+	 * Must be a non-delaying join to a single baserel, else we aren't
 	 * going to be able to do anything with it.
 	 */
-	if (sjinfo->jointype != JOIN_LEFT ||
-		sjinfo->delay_upper_joins)
+	if (sjinfo->delay_upper_joins)
 		return false;
 
 	if (!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))
@@ -206,52 +513,9 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	/* Compute the relid set for the join we are considering */
 	joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
 
-	/*
-	 * We can't remove the join if any inner-rel attributes are used above the
-	 * join.
-	 *
-	 * Note that this test only detects use of inner-rel attributes in higher
-	 * join conditions and the target list.  There might be such attributes in
-	 * pushed-down conditions at this join, too.  We check that case below.
-	 *
-	 * As a micro-optimization, it seems better to start with max_attr and
-	 * count down rather than starting with min_attr and counting up, on the
-	 * theory that the system attributes are somewhat less likely to be wanted
-	 * and should be tested last.
-	 */
-	for (attroff = innerrel->max_attr - innerrel->min_attr;
-		 attroff >= 0;
-		 attroff--)
-	{
-		if (!bms_is_subset(innerrel->attr_needed[attroff], joinrelids))
-			return false;
-	}
-
-	/*
-	 * Similarly check that the inner rel isn't needed by any PlaceHolderVars
-	 * that will be used above the join.  We only need to fail if such a PHV
-	 * actually references some inner-rel attributes; but the correct check
-	 * for that is relatively expensive, so we first check against ph_eval_at,
-	 * which must mention the inner rel if the PHV uses any inner-rel attrs as
-	 * non-lateral references.  Note that if the PHV's syntactic scope is just
-	 * the inner rel, we can't drop the rel even if the PHV is variable-free.
-	 */
-	foreach(l, root->placeholder_list)
-	{
-		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
-
-		if (bms_is_subset(phinfo->ph_needed, joinrelids))
-			continue;			/* PHV is not used above the join */
-		if (bms_overlap(phinfo->ph_lateral, innerrel->relids))
-			return false;		/* it references innerrel laterally */
-		if (!bms_overlap(phinfo->ph_eval_at, innerrel->relids))
-			continue;			/* it definitely doesn't reference innerrel */
-		if (bms_is_subset(phinfo->ph_eval_at, innerrel->relids))
-			return false;		/* there isn't any other place to eval PHV */
-		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
-						innerrel->relids))
-			return false;		/* it does reference innerrel */
-	}
+	/* if the relation is referenced in the query then it cannot be removed */
+	if (relation_is_needed(root, joinrelids, innerrel, NULL))
+		return false;
 
 	/*
 	 * Search for mergejoinable clauses that constrain the inner rel against
@@ -368,6 +632,218 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 	return false;
 }
 
+/*
+ * relation_is_needed
+ *		True if any of the Vars from this relation are required in the query
+ */
+static inline bool
+relation_is_needed(PlannerInfo *root, Relids joinrelids, RelOptInfo *rel, Relids ignoredrels)
+{
+	int		  attroff;
+	ListCell *l;
+
+	/*
+	 * rel is referenced if any of it's attributes are used above the join.
+	 *
+	 * Note that this test only detects use of rel's attributes in higher
+	 * join conditions and the target list.  There might be such attributes in
+	 * pushed-down conditions at this join, too.
+	 *
+	 * As a micro-optimization, it seems better to start with max_attr and
+	 * count down rather than starting with min_attr and counting up, on the
+	 * theory that the system attributes are somewhat less likely to be wanted
+	 * and should be tested last.
+	 */
+	for (attroff = rel->max_attr - rel->min_attr;
+		 attroff >= 0;
+		 attroff--)
+	{
+		if (!bms_is_subset(bms_difference(rel->attr_needed[attroff], ignoredrels), joinrelids))
+			return true;
+	}
+
+	/*
+	 * Similarly check that rel isn't needed by any PlaceHolderVars that will
+	 * be used above the join.  We only need to fail if such a PHV actually
+	 * references some of rel's attributes; but the correct check for that is
+	 * relatively expensive, so we first check against ph_eval_at, which must
+	 * mention rel if the PHV uses any of-rel's attrs as non-lateral
+	 * references.  Note that if the PHV's syntactic scope is just rel, we
+	 * can't return true even if the PHV is variable-free.
+	 */
+	foreach(l, root->placeholder_list)
+	{
+		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
+
+		if (bms_is_subset(phinfo->ph_needed, joinrelids))
+			continue;			/* PHV is not used above the join */
+		if (bms_overlap(phinfo->ph_lateral, rel->relids))
+			return true;		/* it references rel laterally */
+		if (!bms_overlap(phinfo->ph_eval_at, rel->relids))
+			continue;			/* it definitely doesn't reference rel */
+		if (bms_is_subset(phinfo->ph_eval_at, rel->relids))
+			return true;		/* there isn't any other place to eval PHV */
+		if (bms_overlap(pull_varnos((Node *) phinfo->ph_var->phexpr),
+						rel->relids))
+			return true;		/* it does reference rel */
+	}
+
+	return false; /* it does not reference rel */
+}
+
+/*
+ * relation_has_foreign_key_for
+ *	  Checks if rel has a foreign key which references referencedrel with the
+ *	  given list of expressions.
+ *
+ *	For the match to succeed:
+ *	  referencing_vars must match the columns defined in the foreign key.
+ *	  index_vars must match the columns defined in the index for the foreign key.
+ */
+static bool
+relation_has_foreign_key_for(PlannerInfo *root, RelOptInfo *rel,
+			RelOptInfo *referencedrel, List *referencing_vars,
+			List *index_vars, List *operator_list)
+{
+	ListCell *lc;
+	Oid		  refreloid;
+
+	/*
+	 * Look up the Oid of the referenced relation. We only want to look at
+	 * foreign keys on the referencing relation which reference this relation.
+	 */
+	refreloid = root->simple_rte_array[referencedrel->relid]->relid;
+
+	Assert(list_length(referencing_vars) > 0);
+	Assert(list_length(referencing_vars) == list_length(index_vars));
+	Assert(list_length(referencing_vars) == list_length(operator_list));
+
+	/*
+	 * Search through each foreign key on the referencing relation and try
+	 * to find one which references the relation in the join condition. If we
+	 * find one then we'll send the join conditions off to
+	 * expressions_match_foreign_key() to see if they match the foreign key.
+	 */
+	foreach(lc, rel->fklist)
+	{
+		ForeignKeyInfo *fk = (ForeignKeyInfo *) lfirst(lc);
+
+		if (fk->confrelid == refreloid)
+		{
+			if (expressions_match_foreign_key(fk, referencing_vars,
+				index_vars, operator_list))
+				return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * expressions_match_foreign_key
+ *		True if the given fkvars, indexvars and operators will match
+ *		exactly 1 record in the referenced relation of the foreign key.
+ *
+ * Note: This function expects fkvars and indexvars to only contain Var types.
+ *		 Expression indexes are not supported by foreign keys.
+ */
+static bool
+expressions_match_foreign_key(ForeignKeyInfo *fk, List *fkvars,
+					List *indexvars, List *operators)
+{
+	ListCell  *lc;
+	ListCell  *lc2;
+	ListCell  *lc3;
+	Bitmapset *allitems;
+	Bitmapset *matcheditems;
+	int		   lstidx;
+	int		   col;
+
+	Assert(list_length(fkvars) == list_length(indexvars));
+	Assert(list_length(fkvars) == list_length(operators));
+
+	/*
+	 * Fast path out if there's not enough conditions to match each column in
+	 * the foreign key. Note that we cannot check that the number of
+	 * expressions are equal here since it would cause any expressions which
+	 * are duplicated not to match.
+	 */
+	if (list_length(fkvars) < fk->conncols)
+		return false;
+
+	/*
+	 * We need to ensure that each foreign key column can be matched to a list
+	 * item, and we need to ensure that each list item can be matched to a
+	 * foreign key column. We do this by looping over each foreign key column
+	 * and checking that we can find an item in the list which matches the
+	 * current column, however this method does not allow us to ensure that no
+	 * additional items exist in the list. We could solve that by performing
+	 * another loop over each list item and check that it matches a foreign key
+	 * column, but that's a bit wasteful. Instead we'll use 2 bitmapsets, one
+	 * to store the 0 based index of each list item, and with the other we'll
+	 * store each list index that we've managed to match. After we're done
+	 * matching we'll just make sure that both bitmapsets are equal.
+	 */
+	allitems = NULL;
+	matcheditems = NULL;
+
+	/*
+	 * Build a bitmapset which contains each 1 based list index. It seems more
+	 * efficient to do this in reverse so that we allocate enough memory for
+	 * the bitmapset on first loop rather than reallocating each time we find
+	 * we need a bit more space.
+	 */
+	for (lstidx = list_length(fkvars) - 1; lstidx >= 0; lstidx--)
+		allitems = bms_add_member(allitems, lstidx);
+
+	for (col = 0; col < fk->conncols; col++)
+	{
+		bool  matched = false;
+
+		lstidx = 0;
+
+		forthree(lc, fkvars, lc2, indexvars, lc3, operators)
+		{
+			Var *expr = (Var *) lfirst(lc);
+			Var *idxexpr = (Var *) lfirst(lc2);
+			Oid  opr = lfirst_oid(lc3);
+
+			Assert(IsA(expr, Var));
+			Assert(IsA(idxexpr, Var));
+
+			/* Does this join qual match up to the current fkey column? */
+			if (fk->conkey[col] == expr->varattno &&
+				fk->confkey[col] == idxexpr->varattno &&
+				equality_ops_are_compatible(opr, fk->conpfeqop[col]))
+			{
+				matched = true;
+
+				/* mark this list item as matched */
+				matcheditems = bms_add_member(matcheditems, lstidx);
+
+				/*
+				 * Don't break here as there may be duplicate expressions
+				 * that we also need to match against.
+				 */
+			}
+			lstidx++;
+		}
+
+		/* punt if there's no match. */
+		if (!matched)
+			return false;
+	}
+
+	/*
+	 * Ensure that we managed to match every item in the list to a foreign key
+	 * column.
+	 */
+	if (!bms_equal(allitems, matcheditems))
+		return false;
+
+	return true; /* matched */
+}
+
 
 /*
  * Remove the target relid from the planner's data structures, having
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index cb69c03..6f02834 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4679,6 +4679,15 @@ make_lockrows(Plan *lefttree, List *rowMarks, int epqParam)
 	return node;
 }
 
+AlternativePlan *
+make_alternativeplan(List *planlist)
+{
+	AlternativePlan *node = makeNode(AlternativePlan);
+	node->planList = planlist;
+
+	return node;
+}
+
 /*
  * Note: offset_est and count_est are passed in to save having to repeat
  * work already done to estimate the values of the limitOffset and limitCount
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index af772a2..22c9f30 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -409,6 +409,7 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	Path	   *sorted_path;
 	Cost		path_cost;
 	double		path_fraction;
+	List	   *final_rel_list;
 
 	/*----------
 	 * Generate modified query of the form
@@ -479,8 +480,12 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
 	subroot->tuple_fraction = 1.0;
 	subroot->limit_tuples = 1.0;
 
-	final_rel = query_planner(subroot, parse->targetList,
-							  minmax_qp_callback, NULL);
+	final_rel_list = query_planner(subroot, parse->targetList,
+							  minmax_qp_callback, NULL, true);
+
+	Assert(list_length(final_rel_list) ==  1);
+
+	final_rel = (RelOptInfo *) linitial(final_rel_list);
 
 	/*
 	 * Get the best presorted path, that being the one that's cheapest for
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 848df97..b7bf67d 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -34,7 +34,7 @@
  *
  * Since query_planner does not handle the toplevel processing (grouping,
  * sorting, etc) it cannot select the best path by itself.  Instead, it
- * returns the RelOptInfo for the top level of joining, and the caller
+ * returns a list of RelOptInfo for the top level of joining, and the caller
  * (grouping_planner) can choose one of the surviving paths for the rel.
  * Normally it would choose either the rel's cheapest path, or the cheapest
  * path for the desired sort order.
@@ -50,14 +50,23 @@
  * plan.  This value is *not* available at call time, but is computed by
  * qp_callback once we have completed merging the query's equivalence classes.
  * (We cannot construct canonical pathkeys until that's done.)
+ *
+ * Note: during the planning process, the planner may discover optimization
+ * opportunities that may or may not be possible to utiliize during query
+ * execution. In this case the planner will generate 2 plans. 1 for the fully
+ * optimized version, and 1 all purpose plan which will only be used if
+ * conditions are not found to be favourable for the optimized version of the
+ * plan during executor startup.
  */
-RelOptInfo *
+List *
 query_planner(PlannerInfo *root, List *tlist,
-			  query_pathkeys_callback qp_callback, void *qp_extra)
+			  query_pathkeys_callback qp_callback, void *qp_extra,
+			  bool all_purpose_plan_only)
 {
 	Query	   *parse = root->parse;
 	List	   *joinlist;
 	RelOptInfo *final_rel;
+	List	   *final_rel_list = NIL;
 	Index		rti;
 	double		total_pages;
 
@@ -84,7 +93,7 @@ query_planner(PlannerInfo *root, List *tlist,
 		root->canon_pathkeys = NIL;
 		(*qp_callback) (root, qp_extra);
 
-		return final_rel;
+		return list_make1(final_rel);
 	}
 
 	/*
@@ -231,14 +240,37 @@ query_planner(PlannerInfo *root, List *tlist,
 	root->total_table_pages = total_pages;
 
 	/*
-	 * Ready to do the primary planning.
+	 * If the planner found any optimizations that caused the plan not to be
+	 * suitable in all situations, then we must create 2 plans. One will be
+	 * the fully the optimized version and the other will be a general purpose
+	 * plan that will only be used by the executor if any of the required
+	 * conditions for the optimization were not met. Note that we'll only
+	 * generate an optimized plan if the caller didn't specifically request an
+	 * all purpose plan.
 	 */
-	final_rel = make_one_rel(root, joinlist);
+	if (root->glob->suitableFor != PLAN_SUITABILITY_ALL_PURPOSE
+		&& all_purpose_plan_only == false)
+	{
+		/* Generate fully optimized plan, with all removable joins removed */
+		final_rel = make_one_rel(root, joinlist, root->glob->suitableFor);
+
+		/* Check that we got at least one usable path */
+		if (!final_rel || !final_rel->cheapest_total_path ||
+			final_rel->cheapest_total_path->param_info != NULL)
+			elog(ERROR, "failed to construct the join relation");
+
+		final_rel_list = lappend(final_rel_list, final_rel);
+	}
+
+	/* generate an all purpose plan */
+	final_rel = make_one_rel(root, joinlist, PLAN_SUITABILITY_ALL_PURPOSE);
 
 	/* Check that we got at least one usable path */
 	if (!final_rel || !final_rel->cheapest_total_path ||
 		final_rel->cheapest_total_path->param_info != NULL)
 		elog(ERROR, "failed to construct the join relation");
 
-	return final_rel;
+	final_rel_list = lappend(final_rel_list, final_rel);
+
+	return final_rel_list;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 05687a4..8abf1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -178,6 +178,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	glob->lastRowMarkId = 0;
 	glob->transientPlan = false;
 	glob->hasRowSecurity = false;
+	glob->suitableFor = PLAN_SUITABILITY_ALL_PURPOSE;
 
 	/* Determine what fraction of the plan is likely to be scanned */
 	if (cursorOptions & CURSOR_OPT_FAST_PLAN)
@@ -256,6 +257,7 @@ standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
 	result->invalItems = glob->invalItems;
 	result->nParamExec = glob->nParamExec;
 	result->hasRowSecurity = glob->hasRowSecurity;
+	result->suitableFor = glob->suitableFor;
 
 	return result;
 }
@@ -1103,10 +1105,12 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	int64		count_est = 0;
 	double		limit_tuples = -1.0;
 	Plan	   *result_plan;
+	List	   *result_plan_list = NIL;
 	List	   *current_pathkeys;
 	double		dNumGroups = 0;
 	bool		use_hashed_distinct = false;
 	bool		tested_hashed_distinct = false;
+	ListCell   *lc;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1185,6 +1189,8 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		root->sort_pathkeys = make_pathkeys_for_sortclauses(root,
 															parse->sortClause,
 															tlist);
+
+		result_plan_list = list_make1(result_plan);
 	}
 	else
 	{
@@ -1194,6 +1200,7 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		bool		need_tlist_eval = true;
 		standard_qp_extra qp_extra;
 		RelOptInfo *final_rel;
+		List	   *final_rel_list;
 		Path	   *cheapest_path;
 		Path	   *sorted_path;
 		Path	   *best_path;
@@ -1304,710 +1311,723 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 		 * standard_qp_callback) pathkey representations of the query's sort
 		 * clause, distinct clause, etc.
 		 */
-		final_rel = query_planner(root, sub_tlist,
-								  standard_qp_callback, &qp_extra);
-
-		/*
-		 * Extract rowcount and width estimates for use below.
-		 */
-		path_rows = final_rel->rows;
-		path_width = final_rel->width;
+		final_rel_list = query_planner(root, sub_tlist,
+							  standard_qp_callback, &qp_extra, false);
 
-		/*
-		 * If there's grouping going on, estimate the number of result groups.
-		 * We couldn't do this any earlier because it depends on relation size
-		 * estimates that are created within query_planner().
-		 *
-		 * Then convert tuple_fraction to fractional form if it is absolute,
-		 * and if grouping or aggregation is involved, adjust tuple_fraction
-		 * to describe the fraction of the underlying un-aggregated tuples
-		 * that will be fetched.
-		 */
-		dNumGroups = 1;			/* in case not grouping */
-
-		if (parse->groupClause)
+		foreach(lc, final_rel_list)
 		{
-			List	   *groupExprs;
-
-			groupExprs = get_sortgrouplist_exprs(parse->groupClause,
-												 parse->targetList);
-			dNumGroups = estimate_num_groups(root, groupExprs, path_rows);
-
+			final_rel = (RelOptInfo *) lfirst(lc);
 			/*
-			 * In GROUP BY mode, an absolute LIMIT is relative to the number
-			 * of groups not the number of tuples.  If the caller gave us a
-			 * fraction, keep it as-is.  (In both cases, we are effectively
-			 * assuming that all the groups are about the same size.)
+			 * Extract rowcount and width estimates for use below.
 			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= dNumGroups;
+			path_rows = final_rel->rows;
+			path_width = final_rel->width;
 
 			/*
-			 * If both GROUP BY and ORDER BY are specified, we will need two
-			 * levels of sort --- and, therefore, certainly need to read all
-			 * the tuples --- unless ORDER BY is a subset of GROUP BY.
-			 * Likewise if we have both DISTINCT and GROUP BY, or if we have a
-			 * window specification not compatible with the GROUP BY.
-			 */
-			if (!pathkeys_contained_in(root->sort_pathkeys,
-									   root->group_pathkeys) ||
-				!pathkeys_contained_in(root->distinct_pathkeys,
-									   root->group_pathkeys) ||
-				!pathkeys_contained_in(root->window_pathkeys,
-									   root->group_pathkeys))
-				tuple_fraction = 0.0;
-		}
-		else if (parse->hasAggs || root->hasHavingQual)
-		{
-			/*
-			 * Ungrouped aggregate will certainly want to read all the tuples,
-			 * and it will deliver a single result row (so leave dNumGroups
-			 * set to 1).
-			 */
-			tuple_fraction = 0.0;
-		}
-		else if (parse->distinctClause)
-		{
-			/*
-			 * Since there was no grouping or aggregation, it's reasonable to
-			 * assume the UNIQUE filter has effects comparable to GROUP BY.
-			 * (If DISTINCT is used with grouping, we ignore its effects for
-			 * rowcount estimation purposes; this amounts to assuming the
-			 * grouped rows are distinct already.)
-			 */
-			List	   *distinctExprs;
-
-			distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
-													parse->targetList);
-			dNumGroups = estimate_num_groups(root, distinctExprs, path_rows);
-
-			/*
-			 * Adjust tuple_fraction the same way as for GROUP BY, too.
-			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= dNumGroups;
-		}
-		else
-		{
-			/*
-			 * Plain non-grouped, non-aggregated query: an absolute tuple
-			 * fraction can be divided by the number of tuples.
+			 * If there's grouping going on, estimate the number of result groups.
+			 * We couldn't do this any earlier because it depends on relation size
+			 * estimates that are created within query_planner().
+			 *
+			 * Then convert tuple_fraction to fractional form if it is absolute,
+			 * and if grouping or aggregation is involved, adjust tuple_fraction
+			 * to describe the fraction of the underlying un-aggregated tuples
+			 * that will be fetched.
 			 */
-			if (tuple_fraction >= 1.0)
-				tuple_fraction /= path_rows;
-		}
+			dNumGroups = 1;			/* in case not grouping */
 
-		/*
-		 * Pick out the cheapest-total path as well as the cheapest presorted
-		 * path for the requested pathkeys (if there is one).  We should take
-		 * the tuple fraction into account when selecting the cheapest
-		 * presorted path, but not when selecting the cheapest-total path,
-		 * since if we have to sort then we'll have to fetch all the tuples.
-		 * (But there's a special case: if query_pathkeys is NIL, meaning
-		 * order doesn't matter, then the "cheapest presorted" path will be
-		 * the cheapest overall for the tuple fraction.)
-		 */
-		cheapest_path = final_rel->cheapest_total_path;
-
-		sorted_path =
-			get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
-													  root->query_pathkeys,
-													  NULL,
-													  tuple_fraction);
+			if (parse->groupClause)
+			{
+				List	   *groupExprs;
 
-		/* Don't consider same path in both guises; just wastes effort */
-		if (sorted_path == cheapest_path)
-			sorted_path = NULL;
+				groupExprs = get_sortgrouplist_exprs(parse->groupClause,
+													 parse->targetList);
+				dNumGroups = estimate_num_groups(root, groupExprs, path_rows);
 
-		/*
-		 * Forget about the presorted path if it would be cheaper to sort the
-		 * cheapest-total path.  Here we need consider only the behavior at
-		 * the tuple_fraction point.  Also, limit_tuples is only relevant if
-		 * not grouping/aggregating, so use root->limit_tuples in the
-		 * cost_sort call.
-		 */
-		if (sorted_path)
-		{
-			Path		sort_path;		/* dummy for result of cost_sort */
+				/*
+				 * In GROUP BY mode, an absolute LIMIT is relative to the number
+				 * of groups not the number of tuples.  If the caller gave us a
+				 * fraction, keep it as-is.  (In both cases, we are effectively
+				 * assuming that all the groups are about the same size.)
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= dNumGroups;
 
-			if (root->query_pathkeys == NIL ||
-				pathkeys_contained_in(root->query_pathkeys,
-									  cheapest_path->pathkeys))
-			{
-				/* No sort needed for cheapest path */
-				sort_path.startup_cost = cheapest_path->startup_cost;
-				sort_path.total_cost = cheapest_path->total_cost;
+				/*
+				 * If both GROUP BY and ORDER BY are specified, we will need two
+				 * levels of sort --- and, therefore, certainly need to read all
+				 * the tuples --- unless ORDER BY is a subset of GROUP BY.
+				 * Likewise if we have both DISTINCT and GROUP BY, or if we have a
+				 * window specification not compatible with the GROUP BY.
+				 */
+				if (!pathkeys_contained_in(root->sort_pathkeys,
+										   root->group_pathkeys) ||
+					!pathkeys_contained_in(root->distinct_pathkeys,
+										   root->group_pathkeys) ||
+					!pathkeys_contained_in(root->window_pathkeys,
+										   root->group_pathkeys))
+					tuple_fraction = 0.0;
 			}
-			else
+			else if (parse->hasAggs || root->hasHavingQual)
 			{
-				/* Figure cost for sorting */
-				cost_sort(&sort_path, root, root->query_pathkeys,
-						  cheapest_path->total_cost,
-						  path_rows, path_width,
-						  0.0, work_mem, root->limit_tuples);
+				/*
+				 * Ungrouped aggregate will certainly want to read all the tuples,
+				 * and it will deliver a single result row (so leave dNumGroups
+				 * set to 1).
+				 */
+				tuple_fraction = 0.0;
 			}
-
-			if (compare_fractional_path_costs(sorted_path, &sort_path,
-											  tuple_fraction) > 0)
+			else if (parse->distinctClause)
 			{
-				/* Presorted path is a loser */
-				sorted_path = NULL;
-			}
-		}
+				/*
+				 * Since there was no grouping or aggregation, it's reasonable to
+				 * assume the UNIQUE filter has effects comparable to GROUP BY.
+				 * (If DISTINCT is used with grouping, we ignore its effects for
+				 * rowcount estimation purposes; this amounts to assuming the
+				 * grouped rows are distinct already.)
+				 */
+				List	   *distinctExprs;
 
-		/*
-		 * Consider whether we want to use hashing instead of sorting.
-		 */
-		if (parse->groupClause)
-		{
-			/*
-			 * If grouping, decide whether to use sorted or hashed grouping.
-			 */
-			use_hashed_grouping =
-				choose_hashed_grouping(root,
-									   tuple_fraction, limit_tuples,
-									   path_rows, path_width,
-									   cheapest_path, sorted_path,
-									   dNumGroups, &agg_costs);
-			/* Also convert # groups to long int --- but 'ware overflow! */
-			numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
-		}
-		else if (parse->distinctClause && sorted_path &&
-				 !root->hasHavingQual && !parse->hasAggs && !activeWindows)
-		{
-			/*
-			 * We'll reach the DISTINCT stage without any intermediate
-			 * processing, so figure out whether we will want to hash or not
-			 * so we can choose whether to use cheapest or sorted path.
-			 */
-			use_hashed_distinct =
-				choose_hashed_distinct(root,
-									   tuple_fraction, limit_tuples,
-									   path_rows, path_width,
-									   cheapest_path->startup_cost,
-									   cheapest_path->total_cost,
-									   sorted_path->startup_cost,
-									   sorted_path->total_cost,
-									   sorted_path->pathkeys,
-									   dNumGroups);
-			tested_hashed_distinct = true;
-		}
+				distinctExprs = get_sortgrouplist_exprs(parse->distinctClause,
+														parse->targetList);
+				dNumGroups = estimate_num_groups(root, distinctExprs, path_rows);
 
-		/*
-		 * Select the best path.  If we are doing hashed grouping, we will
-		 * always read all the input tuples, so use the cheapest-total path.
-		 * Otherwise, the comparison above is correct.
-		 */
-		if (use_hashed_grouping || use_hashed_distinct || !sorted_path)
-			best_path = cheapest_path;
-		else
-			best_path = sorted_path;
+				/*
+				 * Adjust tuple_fraction the same way as for GROUP BY, too.
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= dNumGroups;
+			}
+			else
+			{
+				/*
+				 * Plain non-grouped, non-aggregated query: an absolute tuple
+				 * fraction can be divided by the number of tuples.
+				 */
+				if (tuple_fraction >= 1.0)
+					tuple_fraction /= path_rows;
+			}
 
-		/*
-		 * Check to see if it's possible to optimize MIN/MAX aggregates. If
-		 * so, we will forget all the work we did so far to choose a "regular"
-		 * path ... but we had to do it anyway to be able to tell which way is
-		 * cheaper.
-		 */
-		result_plan = optimize_minmax_aggregates(root,
-												 tlist,
-												 &agg_costs,
-												 best_path);
-		if (result_plan != NULL)
-		{
-			/*
-			 * optimize_minmax_aggregates generated the full plan, with the
-			 * right tlist, and it has no sort order.
-			 */
-			current_pathkeys = NIL;
-		}
-		else
-		{
 			/*
-			 * Normal case --- create a plan according to query_planner's
-			 * results.
+			 * Pick out the cheapest-total path as well as the cheapest presorted
+			 * path for the requested pathkeys (if there is one).  We should take
+			 * the tuple fraction into account when selecting the cheapest
+			 * presorted path, but not when selecting the cheapest-total path,
+			 * since if we have to sort then we'll have to fetch all the tuples.
+			 * (But there's a special case: if query_pathkeys is NIL, meaning
+			 * order doesn't matter, then the "cheapest presorted" path will be
+			 * the cheapest overall for the tuple fraction.)
 			 */
-			bool		need_sort_for_grouping = false;
+			cheapest_path = final_rel->cheapest_total_path;
 
-			result_plan = create_plan(root, best_path);
-			current_pathkeys = best_path->pathkeys;
+			sorted_path =
+				get_cheapest_fractional_path_for_pathkeys(final_rel->pathlist,
+														  root->query_pathkeys,
+														  NULL,
+														  tuple_fraction);
 
-			/* Detect if we'll need an explicit sort for grouping */
-			if (parse->groupClause && !use_hashed_grouping &&
-			  !pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
-			{
-				need_sort_for_grouping = true;
-
-				/*
-				 * Always override create_plan's tlist, so that we don't sort
-				 * useless data from a "physical" tlist.
-				 */
-				need_tlist_eval = true;
-			}
+			/* Don't consider same path in both guises; just wastes effort */
+			if (sorted_path == cheapest_path)
+				sorted_path = NULL;
 
 			/*
-			 * create_plan returns a plan with just a "flat" tlist of required
-			 * Vars.  Usually we need to insert the sub_tlist as the tlist of
-			 * the top plan node.  However, we can skip that if we determined
-			 * that whatever create_plan chose to return will be good enough.
+			 * Forget about the presorted path if it would be cheaper to sort the
+			 * cheapest-total path.  Here we need consider only the behavior at
+			 * the tuple_fraction point.  Also, limit_tuples is only relevant if
+			 * not grouping/aggregating, so use root->limit_tuples in the
+			 * cost_sort call.
 			 */
-			if (need_tlist_eval)
+			if (sorted_path)
 			{
-				/*
-				 * If the top-level plan node is one that cannot do expression
-				 * evaluation and its existing target list isn't already what
-				 * we need, we must insert a Result node to project the
-				 * desired tlist.
-				 */
-				if (!is_projection_capable_plan(result_plan) &&
-					!tlist_same_exprs(sub_tlist, result_plan->targetlist))
+				Path		sort_path;		/* dummy for result of cost_sort */
+
+				if (root->query_pathkeys == NIL ||
+					pathkeys_contained_in(root->query_pathkeys,
+										  cheapest_path->pathkeys))
 				{
-					result_plan = (Plan *) make_result(root,
-													   sub_tlist,
-													   NULL,
-													   result_plan);
+					/* No sort needed for cheapest path */
+					sort_path.startup_cost = cheapest_path->startup_cost;
+					sort_path.total_cost = cheapest_path->total_cost;
 				}
 				else
 				{
-					/*
-					 * Otherwise, just replace the subplan's flat tlist with
-					 * the desired tlist.
-					 */
-					result_plan->targetlist = sub_tlist;
+					/* Figure cost for sorting */
+					cost_sort(&sort_path, root, root->query_pathkeys,
+							  cheapest_path->total_cost,
+							  path_rows, path_width,
+							  0.0, work_mem, root->limit_tuples);
 				}
 
+				if (compare_fractional_path_costs(sorted_path, &sort_path,
+												  tuple_fraction) > 0)
+				{
+					/* Presorted path is a loser */
+					sorted_path = NULL;
+				}
+			}
+
+			/*
+			 * Consider whether we want to use hashing instead of sorting.
+			 */
+			if (parse->groupClause)
+			{
 				/*
-				 * Also, account for the cost of evaluation of the sub_tlist.
-				 * See comments for add_tlist_costs_to_plan() for more info.
+				 * If grouping, decide whether to use sorted or hashed grouping.
 				 */
-				add_tlist_costs_to_plan(root, result_plan, sub_tlist);
+				use_hashed_grouping =
+					choose_hashed_grouping(root,
+										   tuple_fraction, limit_tuples,
+										   path_rows, path_width,
+										   cheapest_path, sorted_path,
+										   dNumGroups, &agg_costs);
+				/* Also convert # groups to long int --- but 'ware overflow! */
+				numGroups = (long) Min(dNumGroups, (double) LONG_MAX);
 			}
-			else
+			else if (parse->distinctClause && sorted_path &&
+					 !root->hasHavingQual && !parse->hasAggs && !activeWindows)
 			{
 				/*
-				 * Since we're using create_plan's tlist and not the one
-				 * make_subplanTargetList calculated, we have to refigure any
-				 * grouping-column indexes make_subplanTargetList computed.
+				 * We'll reach the DISTINCT stage without any intermediate
+				 * processing, so figure out whether we will want to hash or not
+				 * so we can choose whether to use cheapest or sorted path.
 				 */
-				locate_grouping_columns(root, tlist, result_plan->targetlist,
-										groupColIdx);
+				use_hashed_distinct =
+					choose_hashed_distinct(root,
+										   tuple_fraction, limit_tuples,
+										   path_rows, path_width,
+										   cheapest_path->startup_cost,
+										   cheapest_path->total_cost,
+										   sorted_path->startup_cost,
+										   sorted_path->total_cost,
+										   sorted_path->pathkeys,
+										   dNumGroups);
+				tested_hashed_distinct = true;
 			}
 
 			/*
-			 * Insert AGG or GROUP node if needed, plus an explicit sort step
-			 * if necessary.
-			 *
-			 * HAVING clause, if any, becomes qual of the Agg or Group node.
+			 * Select the best path.  If we are doing hashed grouping, we will
+			 * always read all the input tuples, so use the cheapest-total path.
+			 * Otherwise, the comparison above is correct.
 			 */
-			if (use_hashed_grouping)
+			if (use_hashed_grouping || use_hashed_distinct || !sorted_path)
+				best_path = cheapest_path;
+			else
+				best_path = sorted_path;
+
+			/*
+			 * Check to see if it's possible to optimize MIN/MAX aggregates. If
+			 * so, we will forget all the work we did so far to choose a "regular"
+			 * path ... but we had to do it anyway to be able to tell which way is
+			 * cheaper.
+			 */
+			result_plan = optimize_minmax_aggregates(root,
+													 tlist,
+													 &agg_costs,
+													 best_path);
+			if (result_plan != NULL)
 			{
-				/* Hashed aggregate plan --- no sort needed */
-				result_plan = (Plan *) make_agg(root,
-												tlist,
-												(List *) parse->havingQual,
-												AGG_HASHED,
-												&agg_costs,
-												numGroupCols,
-												groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												numGroups,
-												result_plan);
-				/* Hashed aggregation produces randomly-ordered results */
+				/*
+				 * optimize_minmax_aggregates generated the full plan, with the
+				 * right tlist, and it has no sort order.
+				 */
 				current_pathkeys = NIL;
 			}
-			else if (parse->hasAggs)
+			else
 			{
-				/* Plain aggregate plan --- sort if needed */
-				AggStrategy aggstrategy;
+				/*
+				 * Normal case --- create a plan according to query_planner's
+				 * results.
+				 */
+				bool		need_sort_for_grouping = false;
+
+				result_plan = create_plan(root, best_path);
+				current_pathkeys = best_path->pathkeys;
 
-				if (parse->groupClause)
+				/* Detect if we'll need an explicit sort for grouping */
+				if (parse->groupClause && !use_hashed_grouping &&
+				  !pathkeys_contained_in(root->group_pathkeys, current_pathkeys))
 				{
-					if (need_sort_for_grouping)
+					need_sort_for_grouping = true;
+
+					/*
+					 * Always override create_plan's tlist, so that we don't sort
+					 * useless data from a "physical" tlist.
+					 */
+					need_tlist_eval = true;
+				}
+
+				/*
+				 * create_plan returns a plan with just a "flat" tlist of required
+				 * Vars.  Usually we need to insert the sub_tlist as the tlist of
+				 * the top plan node.  However, we can skip that if we determined
+				 * that whatever create_plan chose to return will be good enough.
+				 */
+				if (need_tlist_eval)
+				{
+					/*
+					 * If the top-level plan node is one that cannot do expression
+					 * evaluation and its existing target list isn't already what
+					 * we need, we must insert a Result node to project the
+					 * desired tlist.
+					 */
+					if (!is_projection_capable_plan(result_plan) &&
+						!tlist_same_exprs(sub_tlist, result_plan->targetlist))
 					{
-						result_plan = (Plan *)
-							make_sort_from_groupcols(root,
-													 parse->groupClause,
-													 groupColIdx,
-													 result_plan);
-						current_pathkeys = root->group_pathkeys;
+						result_plan = (Plan *) make_result(root,
+														   sub_tlist,
+														   NULL,
+														   result_plan);
+					}
+					else
+					{
+						/*
+						 * Otherwise, just replace the subplan's flat tlist with
+						 * the desired tlist.
+						 */
+						result_plan->targetlist = sub_tlist;
 					}
-					aggstrategy = AGG_SORTED;
 
 					/*
-					 * The AGG node will not change the sort ordering of its
-					 * groups, so current_pathkeys describes the result too.
+					 * Also, account for the cost of evaluation of the sub_tlist.
+					 * See comments for add_tlist_costs_to_plan() for more info.
 					 */
+					add_tlist_costs_to_plan(root, result_plan, sub_tlist);
 				}
 				else
 				{
-					aggstrategy = AGG_PLAIN;
-					/* Result will be only one row anyway; no sort order */
-					current_pathkeys = NIL;
+					/*
+					 * Since we're using create_plan's tlist and not the one
+					 * make_subplanTargetList calculated, we have to refigure any
+					 * grouping-column indexes make_subplanTargetList computed.
+					 */
+					locate_grouping_columns(root, tlist, result_plan->targetlist,
+											groupColIdx);
 				}
 
-				result_plan = (Plan *) make_agg(root,
-												tlist,
-												(List *) parse->havingQual,
-												aggstrategy,
-												&agg_costs,
-												numGroupCols,
-												groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												numGroups,
-												result_plan);
-			}
-			else if (parse->groupClause)
-			{
 				/*
-				 * GROUP BY without aggregation, so insert a group node (plus
-				 * the appropriate sort node, if necessary).
+				 * Insert AGG or GROUP node if needed, plus an explicit sort step
+				 * if necessary.
 				 *
-				 * Add an explicit sort if we couldn't make the path come out
-				 * the way the GROUP node needs it.
+				 * HAVING clause, if any, becomes qual of the Agg or Group node.
 				 */
-				if (need_sort_for_grouping)
+				if (use_hashed_grouping)
 				{
-					result_plan = (Plan *)
-						make_sort_from_groupcols(root,
-												 parse->groupClause,
-												 groupColIdx,
-												 result_plan);
-					current_pathkeys = root->group_pathkeys;
+					/* Hashed aggregate plan --- no sort needed */
+					result_plan = (Plan *) make_agg(root,
+													tlist,
+													(List *) parse->havingQual,
+													AGG_HASHED,
+													&agg_costs,
+													numGroupCols,
+													groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													numGroups,
+													result_plan);
+					/* Hashed aggregation produces randomly-ordered results */
+					current_pathkeys = NIL;
 				}
+				else if (parse->hasAggs)
+				{
+					/* Plain aggregate plan --- sort if needed */
+					AggStrategy aggstrategy;
 
-				result_plan = (Plan *) make_group(root,
-												  tlist,
-												  (List *) parse->havingQual,
-												  numGroupCols,
-												  groupColIdx,
-									extract_grouping_ops(parse->groupClause),
-												  dNumGroups,
-												  result_plan);
-				/* The Group node won't change sort ordering */
-			}
-			else if (root->hasHavingQual)
-			{
-				/*
-				 * No aggregates, and no GROUP BY, but we have a HAVING qual.
-				 * This is a degenerate case in which we are supposed to emit
-				 * either 0 or 1 row depending on whether HAVING succeeds.
-				 * Furthermore, there cannot be any variables in either HAVING
-				 * or the targetlist, so we actually do not need the FROM
-				 * table at all!  We can just throw away the plan-so-far and
-				 * generate a Result node.  This is a sufficiently unusual
-				 * corner case that it's not worth contorting the structure of
-				 * this routine to avoid having to generate the plan in the
-				 * first place.
-				 */
-				result_plan = (Plan *) make_result(root,
-												   tlist,
-												   parse->havingQual,
-												   NULL);
-			}
-		}						/* end of non-minmax-aggregate case */
-
-		/*
-		 * Since each window function could require a different sort order, we
-		 * stack up a WindowAgg node for each window, with sort steps between
-		 * them as needed.
-		 */
-		if (activeWindows)
-		{
-			List	   *window_tlist;
-			ListCell   *l;
+					if (parse->groupClause)
+					{
+						if (need_sort_for_grouping)
+						{
+							result_plan = (Plan *)
+								make_sort_from_groupcols(root,
+														 parse->groupClause,
+														 groupColIdx,
+														 result_plan);
+							current_pathkeys = root->group_pathkeys;
+						}
+						aggstrategy = AGG_SORTED;
+
+						/*
+						 * The AGG node will not change the sort ordering of its
+						 * groups, so current_pathkeys describes the result too.
+						 */
+					}
+					else
+					{
+						aggstrategy = AGG_PLAIN;
+						/* Result will be only one row anyway; no sort order */
+						current_pathkeys = NIL;
+					}
 
-			/*
-			 * If the top-level plan node is one that cannot do expression
-			 * evaluation, we must insert a Result node to project the desired
-			 * tlist.  (In some cases this might not really be required, but
-			 * it's not worth trying to avoid it.  In particular, think not to
-			 * skip adding the Result if the initial window_tlist matches the
-			 * top-level plan node's output, because we might change the tlist
-			 * inside the following loop.)	Note that on second and subsequent
-			 * passes through the following loop, the top-level node will be a
-			 * WindowAgg which we know can project; so we only need to check
-			 * once.
-			 */
-			if (!is_projection_capable_plan(result_plan))
-			{
-				result_plan = (Plan *) make_result(root,
-												   NIL,
-												   NULL,
-												   result_plan);
-			}
+					result_plan = (Plan *) make_agg(root,
+													tlist,
+													(List *) parse->havingQual,
+													aggstrategy,
+													&agg_costs,
+													numGroupCols,
+													groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													numGroups,
+													result_plan);
+				}
+				else if (parse->groupClause)
+				{
+					/*
+					 * GROUP BY without aggregation, so insert a group node (plus
+					 * the appropriate sort node, if necessary).
+					 *
+					 * Add an explicit sort if we couldn't make the path come out
+					 * the way the GROUP node needs it.
+					 */
+					if (need_sort_for_grouping)
+					{
+						result_plan = (Plan *)
+							make_sort_from_groupcols(root,
+													 parse->groupClause,
+													 groupColIdx,
+													 result_plan);
+						current_pathkeys = root->group_pathkeys;
+					}
 
-			/*
-			 * The "base" targetlist for all steps of the windowing process is
-			 * a flat tlist of all Vars and Aggs needed in the result.  (In
-			 * some cases we wouldn't need to propagate all of these all the
-			 * way to the top, since they might only be needed as inputs to
-			 * WindowFuncs.  It's probably not worth trying to optimize that
-			 * though.)  We also add window partitioning and sorting
-			 * expressions to the base tlist, to ensure they're computed only
-			 * once at the bottom of the stack (that's critical for volatile
-			 * functions).  As we climb up the stack, we'll add outputs for
-			 * the WindowFuncs computed at each level.
-			 */
-			window_tlist = make_windowInputTargetList(root,
+					result_plan = (Plan *) make_group(root,
 													  tlist,
-													  activeWindows);
+													  (List *) parse->havingQual,
+													  numGroupCols,
+													  groupColIdx,
+										extract_grouping_ops(parse->groupClause),
+													  dNumGroups,
+													  result_plan);
+					/* The Group node won't change sort ordering */
+				}
+				else if (root->hasHavingQual)
+				{
+					/*
+					 * No aggregates, and no GROUP BY, but we have a HAVING qual.
+					 * This is a degenerate case in which we are supposed to emit
+					 * either 0 or 1 row depending on whether HAVING succeeds.
+					 * Furthermore, there cannot be any variables in either HAVING
+					 * or the targetlist, so we actually do not need the FROM
+					 * table at all!  We can just throw away the plan-so-far and
+					 * generate a Result node.  This is a sufficiently unusual
+					 * corner case that it's not worth contorting the structure of
+					 * this routine to avoid having to generate the plan in the
+					 * first place.
+					 */
+					result_plan = (Plan *) make_result(root,
+													   tlist,
+													   parse->havingQual,
+													   NULL);
+				}
+			}						/* end of non-minmax-aggregate case */
 
 			/*
-			 * The copyObject steps here are needed to ensure that each plan
-			 * node has a separately modifiable tlist.  (XXX wouldn't a
-			 * shallow list copy do for that?)
+			 * Since each window function could require a different sort order, we
+			 * stack up a WindowAgg node for each window, with sort steps between
+			 * them as needed.
 			 */
-			result_plan->targetlist = (List *) copyObject(window_tlist);
-
-			foreach(l, activeWindows)
+			if (activeWindows)
 			{
-				WindowClause *wc = (WindowClause *) lfirst(l);
-				List	   *window_pathkeys;
-				int			partNumCols;
-				AttrNumber *partColIdx;
-				Oid		   *partOperators;
-				int			ordNumCols;
-				AttrNumber *ordColIdx;
-				Oid		   *ordOperators;
-
-				window_pathkeys = make_pathkeys_for_window(root,
-														   wc,
-														   tlist);
+				List	   *window_tlist;
+				ListCell   *l;
 
 				/*
-				 * This is a bit tricky: we build a sort node even if we don't
-				 * really have to sort.  Even when no explicit sort is needed,
-				 * we need to have suitable resjunk items added to the input
-				 * plan's tlist for any partitioning or ordering columns that
-				 * aren't plain Vars.  (In theory, make_windowInputTargetList
-				 * should have provided all such columns, but let's not assume
-				 * that here.)	Furthermore, this way we can use existing
-				 * infrastructure to identify which input columns are the
-				 * interesting ones.
+				 * If the top-level plan node is one that cannot do expression
+				 * evaluation, we must insert a Result node to project the desired
+				 * tlist.  (In some cases this might not really be required, but
+				 * it's not worth trying to avoid it.  In particular, think not to
+				 * skip adding the Result if the initial window_tlist matches the
+				 * top-level plan node's output, because we might change the tlist
+				 * inside the following loop.)	Note that on second and subsequent
+				 * passes through the following loop, the top-level node will be a
+				 * WindowAgg which we know can project; so we only need to check
+				 * once.
 				 */
-				if (window_pathkeys)
-				{
-					Sort	   *sort_plan;
-
-					sort_plan = make_sort_from_pathkeys(root,
-														result_plan,
-														window_pathkeys,
-														-1.0);
-					if (!pathkeys_contained_in(window_pathkeys,
-											   current_pathkeys))
-					{
-						/* we do indeed need to sort */
-						result_plan = (Plan *) sort_plan;
-						current_pathkeys = window_pathkeys;
-					}
-					/* In either case, extract the per-column information */
-					get_column_info_for_window(root, wc, tlist,
-											   sort_plan->numCols,
-											   sort_plan->sortColIdx,
-											   &partNumCols,
-											   &partColIdx,
-											   &partOperators,
-											   &ordNumCols,
-											   &ordColIdx,
-											   &ordOperators);
-				}
-				else
+				if (!is_projection_capable_plan(result_plan))
 				{
-					/* empty window specification, nothing to sort */
-					partNumCols = 0;
-					partColIdx = NULL;
-					partOperators = NULL;
-					ordNumCols = 0;
-					ordColIdx = NULL;
-					ordOperators = NULL;
+					result_plan = (Plan *) make_result(root,
+													   NIL,
+													   NULL,
+													   result_plan);
 				}
 
-				if (lnext(l))
-				{
-					/* Add the current WindowFuncs to the running tlist */
-					window_tlist = add_to_flat_tlist(window_tlist,
-										   wflists->windowFuncs[wc->winref]);
-				}
-				else
+				/*
+				 * The "base" targetlist for all steps of the windowing process is
+				 * a flat tlist of all Vars and Aggs needed in the result.  (In
+				 * some cases we wouldn't need to propagate all of these all the
+				 * way to the top, since they might only be needed as inputs to
+				 * WindowFuncs.  It's probably not worth trying to optimize that
+				 * though.)  We also add window partitioning and sorting
+				 * expressions to the base tlist, to ensure they're computed only
+				 * once at the bottom of the stack (that's critical for volatile
+				 * functions).  As we climb up the stack, we'll add outputs for
+				 * the WindowFuncs computed at each level.
+				 */
+				window_tlist = make_windowInputTargetList(root,
+														  tlist,
+														  activeWindows);
+
+				/*
+				 * The copyObject steps here are needed to ensure that each plan
+				 * node has a separately modifiable tlist.  (XXX wouldn't a
+				 * shallow list copy do for that?)
+				 */
+				result_plan->targetlist = (List *) copyObject(window_tlist);
+
+				foreach(l, activeWindows)
 				{
-					/* Install the original tlist in the topmost WindowAgg */
-					window_tlist = tlist;
-				}
+					WindowClause *wc = (WindowClause *) lfirst(l);
+					List	   *window_pathkeys;
+					int			partNumCols;
+					AttrNumber *partColIdx;
+					Oid		   *partOperators;
+					int			ordNumCols;
+					AttrNumber *ordColIdx;
+					Oid		   *ordOperators;
+
+					window_pathkeys = make_pathkeys_for_window(root,
+															   wc,
+															   tlist);
+
+					/*
+					 * This is a bit tricky: we build a sort node even if we don't
+					 * really have to sort.  Even when no explicit sort is needed,
+					 * we need to have suitable resjunk items added to the input
+					 * plan's tlist for any partitioning or ordering columns that
+					 * aren't plain Vars.  (In theory, make_windowInputTargetList
+					 * should have provided all such columns, but let's not assume
+					 * that here.)	Furthermore, this way we can use existing
+					 * infrastructure to identify which input columns are the
+					 * interesting ones.
+					 */
+					if (window_pathkeys)
+					{
+						Sort	   *sort_plan;
+
+						sort_plan = make_sort_from_pathkeys(root,
+															result_plan,
+															window_pathkeys,
+															-1.0);
+						if (!pathkeys_contained_in(window_pathkeys,
+												   current_pathkeys))
+						{
+							/* we do indeed need to sort */
+							result_plan = (Plan *) sort_plan;
+							current_pathkeys = window_pathkeys;
+						}
+						/* In either case, extract the per-column information */
+						get_column_info_for_window(root, wc, tlist,
+												   sort_plan->numCols,
+												   sort_plan->sortColIdx,
+												   &partNumCols,
+												   &partColIdx,
+												   &partOperators,
+												   &ordNumCols,
+												   &ordColIdx,
+												   &ordOperators);
+					}
+					else
+					{
+						/* empty window specification, nothing to sort */
+						partNumCols = 0;
+						partColIdx = NULL;
+						partOperators = NULL;
+						ordNumCols = 0;
+						ordColIdx = NULL;
+						ordOperators = NULL;
+					}
 
-				/* ... and make the WindowAgg plan node */
-				result_plan = (Plan *)
-					make_windowagg(root,
-								   (List *) copyObject(window_tlist),
-								   wflists->windowFuncs[wc->winref],
-								   wc->winref,
-								   partNumCols,
-								   partColIdx,
-								   partOperators,
-								   ordNumCols,
-								   ordColIdx,
-								   ordOperators,
-								   wc->frameOptions,
-								   wc->startOffset,
-								   wc->endOffset,
-								   result_plan);
+					if (lnext(l))
+					{
+						/* Add the current WindowFuncs to the running tlist */
+						window_tlist = add_to_flat_tlist(window_tlist,
+											   wflists->windowFuncs[wc->winref]);
+					}
+					else
+					{
+						/* Install the original tlist in the topmost WindowAgg */
+						window_tlist = tlist;
+					}
+
+					/* ... and make the WindowAgg plan node */
+					result_plan = (Plan *)
+						make_windowagg(root,
+									   (List *) copyObject(window_tlist),
+									   wflists->windowFuncs[wc->winref],
+									   wc->winref,
+									   partNumCols,
+									   partColIdx,
+									   partOperators,
+									   ordNumCols,
+									   ordColIdx,
+									   ordOperators,
+									   wc->frameOptions,
+									   wc->startOffset,
+									   wc->endOffset,
+									   result_plan);
+				}
 			}
-		}
+
+			result_plan_list = lappend(result_plan_list, result_plan);
+		}						 /* foreach final_rel_list */
 	}							/* end of if (setOperations) */
 
-	/*
-	 * If there is a DISTINCT clause, add the necessary node(s).
-	 */
-	if (parse->distinctClause)
+	foreach(lc, result_plan_list)
 	{
-		double		dNumDistinctRows;
-		long		numDistinctRows;
+		result_plan = (Plan *) lfirst(lc);
 
 		/*
-		 * If there was grouping or aggregation, use the current number of
-		 * rows as the estimated number of DISTINCT rows (ie, assume the
-		 * result was already mostly unique).  If not, use the number of
-		 * distinct-groups calculated previously.
+		 * If there is a DISTINCT clause, add the necessary node(s).
 		 */
-		if (parse->groupClause || root->hasHavingQual || parse->hasAggs)
-			dNumDistinctRows = result_plan->plan_rows;
-		else
-			dNumDistinctRows = dNumGroups;
-
-		/* Also convert to long int --- but 'ware overflow! */
-		numDistinctRows = (long) Min(dNumDistinctRows, (double) LONG_MAX);
-
-		/* Choose implementation method if we didn't already */
-		if (!tested_hashed_distinct)
+		if (parse->distinctClause)
 		{
-			/*
-			 * At this point, either hashed or sorted grouping will have to
-			 * work from result_plan, so we pass that as both "cheapest" and
-			 * "sorted".
-			 */
-			use_hashed_distinct =
-				choose_hashed_distinct(root,
-									   tuple_fraction, limit_tuples,
-									   result_plan->plan_rows,
-									   result_plan->plan_width,
-									   result_plan->startup_cost,
-									   result_plan->total_cost,
-									   result_plan->startup_cost,
-									   result_plan->total_cost,
-									   current_pathkeys,
-									   dNumDistinctRows);
-		}
+			double		dNumDistinctRows;
+			long		numDistinctRows;
 
-		if (use_hashed_distinct)
-		{
-			/* Hashed aggregate plan --- no sort needed */
-			result_plan = (Plan *) make_agg(root,
-											result_plan->targetlist,
-											NIL,
-											AGG_HASHED,
-											NULL,
-										  list_length(parse->distinctClause),
-								 extract_grouping_cols(parse->distinctClause,
-													result_plan->targetlist),
-								 extract_grouping_ops(parse->distinctClause),
-											numDistinctRows,
-											result_plan);
-			/* Hashed aggregation produces randomly-ordered results */
-			current_pathkeys = NIL;
-		}
-		else
-		{
 			/*
-			 * Use a Unique node to implement DISTINCT.  Add an explicit sort
-			 * if we couldn't make the path come out the way the Unique node
-			 * needs it.  If we do have to sort, always sort by the more
-			 * rigorous of DISTINCT and ORDER BY, to avoid a second sort
-			 * below.  However, for regular DISTINCT, don't sort now if we
-			 * don't have to --- sorting afterwards will likely be cheaper,
-			 * and also has the possibility of optimizing via LIMIT.  But for
-			 * DISTINCT ON, we *must* force the final sort now, else it won't
-			 * have the desired behavior.
+			 * If there was grouping or aggregation, use the current number of
+			 * rows as the estimated number of DISTINCT rows (ie, assume the
+			 * result was already mostly unique).  If not, use the number of
+			 * distinct-groups calculated previously.
 			 */
-			List	   *needed_pathkeys;
-
-			if (parse->hasDistinctOn &&
-				list_length(root->distinct_pathkeys) <
-				list_length(root->sort_pathkeys))
-				needed_pathkeys = root->sort_pathkeys;
+			if (parse->groupClause || root->hasHavingQual || parse->hasAggs)
+				dNumDistinctRows = result_plan->plan_rows;
 			else
-				needed_pathkeys = root->distinct_pathkeys;
+				dNumDistinctRows = dNumGroups;
+
+			/* Also convert to long int --- but 'ware overflow! */
+			numDistinctRows = (long) Min(dNumDistinctRows, (double) LONG_MAX);
+
+			/* Choose implementation method if we didn't already */
+			if (!tested_hashed_distinct)
+			{
+				/*
+				 * At this point, either hashed or sorted grouping will have to
+				 * work from result_plan, so we pass that as both "cheapest" and
+				 * "sorted".
+				 */
+				use_hashed_distinct =
+					choose_hashed_distinct(root,
+										   tuple_fraction, limit_tuples,
+										   result_plan->plan_rows,
+										   result_plan->plan_width,
+										   result_plan->startup_cost,
+										   result_plan->total_cost,
+										   result_plan->startup_cost,
+										   result_plan->total_cost,
+										   current_pathkeys,
+										   dNumDistinctRows);
+			}
 
-			if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys))
+			if (use_hashed_distinct)
+			{
+				/* Hashed aggregate plan --- no sort needed */
+				result_plan = (Plan *) make_agg(root,
+												result_plan->targetlist,
+												NIL,
+												AGG_HASHED,
+												NULL,
+											  list_length(parse->distinctClause),
+									 extract_grouping_cols(parse->distinctClause,
+														result_plan->targetlist),
+									 extract_grouping_ops(parse->distinctClause),
+												numDistinctRows,
+												result_plan);
+				/* Hashed aggregation produces randomly-ordered results */
+				current_pathkeys = NIL;
+			}
+			else
 			{
-				if (list_length(root->distinct_pathkeys) >=
+				/*
+				 * Use a Unique node to implement DISTINCT.  Add an explicit sort
+				 * if we couldn't make the path come out the way the Unique node
+				 * needs it.  If we do have to sort, always sort by the more
+				 * rigorous of DISTINCT and ORDER BY, to avoid a second sort
+				 * below.  However, for regular DISTINCT, don't sort now if we
+				 * don't have to --- sorting afterwards will likely be cheaper,
+				 * and also has the possibility of optimizing via LIMIT.  But for
+				 * DISTINCT ON, we *must* force the final sort now, else it won't
+				 * have the desired behavior.
+				 */
+				List	   *needed_pathkeys;
+
+				if (parse->hasDistinctOn &&
+					list_length(root->distinct_pathkeys) <
 					list_length(root->sort_pathkeys))
-					current_pathkeys = root->distinct_pathkeys;
+					needed_pathkeys = root->sort_pathkeys;
 				else
+					needed_pathkeys = root->distinct_pathkeys;
+
+				if (!pathkeys_contained_in(needed_pathkeys, current_pathkeys))
 				{
-					current_pathkeys = root->sort_pathkeys;
-					/* Assert checks that parser didn't mess up... */
-					Assert(pathkeys_contained_in(root->distinct_pathkeys,
-												 current_pathkeys));
+					if (list_length(root->distinct_pathkeys) >=
+						list_length(root->sort_pathkeys))
+						current_pathkeys = root->distinct_pathkeys;
+					else
+					{
+						current_pathkeys = root->sort_pathkeys;
+						/* Assert checks that parser didn't mess up... */
+						Assert(pathkeys_contained_in(root->distinct_pathkeys,
+													 current_pathkeys));
+					}
+
+					result_plan = (Plan *) make_sort_from_pathkeys(root,
+																   result_plan,
+																current_pathkeys,
+																   -1.0);
 				}
 
+				result_plan = (Plan *) make_unique(result_plan,
+												   parse->distinctClause);
+				result_plan->plan_rows = dNumDistinctRows;
+				/* The Unique node won't change sort ordering */
+			}
+		}
+
+		/*
+		 * If ORDER BY was given and we were not able to make the plan come out in
+		 * the right order, add an explicit sort step.
+		 */
+		if (parse->sortClause)
+		{
+			if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
+			{
 				result_plan = (Plan *) make_sort_from_pathkeys(root,
 															   result_plan,
-															current_pathkeys,
-															   -1.0);
+															 root->sort_pathkeys,
+															   limit_tuples);
+				current_pathkeys = root->sort_pathkeys;
 			}
-
-			result_plan = (Plan *) make_unique(result_plan,
-											   parse->distinctClause);
-			result_plan->plan_rows = dNumDistinctRows;
-			/* The Unique node won't change sort ordering */
 		}
-	}
 
-	/*
-	 * If ORDER BY was given and we were not able to make the plan come out in
-	 * the right order, add an explicit sort step.
-	 */
-	if (parse->sortClause)
-	{
-		if (!pathkeys_contained_in(root->sort_pathkeys, current_pathkeys))
+		/*
+		 * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
+		 * (Note: we intentionally test parse->rowMarks not root->rowMarks here.
+		 * If there are only non-locking rowmarks, they should be handled by the
+		 * ModifyTable node instead.)
+		 */
+		if (parse->rowMarks)
 		{
-			result_plan = (Plan *) make_sort_from_pathkeys(root,
-														   result_plan,
-														 root->sort_pathkeys,
-														   limit_tuples);
-			current_pathkeys = root->sort_pathkeys;
-		}
-	}
+			result_plan = (Plan *) make_lockrows(result_plan,
+												 root->rowMarks,
+												 SS_assign_special_param(root));
 
-	/*
-	 * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
-	 * (Note: we intentionally test parse->rowMarks not root->rowMarks here.
-	 * If there are only non-locking rowmarks, they should be handled by the
-	 * ModifyTable node instead.)
-	 */
-	if (parse->rowMarks)
-	{
-		result_plan = (Plan *) make_lockrows(result_plan,
-											 root->rowMarks,
-											 SS_assign_special_param(root));
+			/*
+			 * The result can no longer be assumed sorted, since locking might
+			 * cause the sort key columns to be replaced with new values.
+			 */
+			current_pathkeys = NIL;
+		}
 
 		/*
-		 * The result can no longer be assumed sorted, since locking might
-		 * cause the sort key columns to be replaced with new values.
+		 * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node.
 		 */
-		current_pathkeys = NIL;
-	}
+		if (limit_needed(parse))
+		{
+			result_plan = (Plan *) make_limit(result_plan,
+											  parse->limitOffset,
+											  parse->limitCount,
+											  offset_est,
+											  count_est);
+		}
 
-	/*
-	 * Finally, if there is a LIMIT/OFFSET clause, add the LIMIT node.
-	 */
-	if (limit_needed(parse))
-	{
-		result_plan = (Plan *) make_limit(result_plan,
-										  parse->limitOffset,
-										  parse->limitCount,
-										  offset_est,
-										  count_est);
-	}
+		lfirst(lc) = result_plan;
+	} /* foreach all_plans */
 
 	/*
 	 * Return the actual output ordering in query_pathkeys for possible use by
@@ -2015,7 +2035,16 @@ grouping_planner(PlannerInfo *root, double tuple_fraction)
 	 */
 	root->query_pathkeys = current_pathkeys;
 
-	return result_plan;
+	/* if there is only one plan, then just return that plan */
+	if (list_length(result_plan_list) == 1)
+		return (Plan *) linitial(result_plan_list);
+
+	/*
+	 * Otherwise we'd better add an AlternativePlan node to allow the executor
+	 * to decide which plan to use.
+	 */
+	else
+		return (Plan *) make_alternativeplan(result_plan_list);
 }
 
 /*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ec828cd..cf7692a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -434,6 +434,17 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 	 */
 	switch (nodeTag(plan))
 	{
+		case T_AlternativePlan:
+			{
+				AlternativePlan *aplan = (AlternativePlan *) plan;
+				ListCell *lc;
+				foreach(lc, aplan->planList)
+				{
+					Plan *plan = (Plan *) lfirst(lc);
+					set_plan_refs(root, plan, rtoffset);
+				}
+			}
+			break;
 		case T_SeqScan:
 			{
 				SeqScan    *splan = (SeqScan *) plan;
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 313a5c1..b41b965 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -25,7 +25,9 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_constraint.h"
 #include "catalog/heap.h"
+#include "catalog/pg_type.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -38,6 +40,7 @@
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
@@ -89,6 +92,12 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 	Relation	relation;
 	bool		hasindex;
 	List	   *indexinfos = NIL;
+	List	   *fkinfos = NIL;
+	Relation	fkeyRel;
+	Relation	fkeyRelIdx;
+	ScanKeyData fkeyScankey;
+	SysScanDesc fkeyScan;
+	HeapTuple	tuple;
 
 	/*
 	 * We need not lock the relation since it was already locked, either by
@@ -384,6 +393,111 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 
 	heap_close(relation, NoLock);
 
+	/* load foreign key constraints */
+	ScanKeyInit(&fkeyScankey,
+				Anum_pg_constraint_conrelid,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(relationObjectId));
+
+	fkeyRel = heap_open(ConstraintRelationId, AccessShareLock);
+	fkeyRelIdx = index_open(ConstraintRelidIndexId, AccessShareLock);
+	fkeyScan = systable_beginscan_ordered(fkeyRel, fkeyRelIdx, NULL, 1, &fkeyScankey);
+
+	while ((tuple = systable_getnext_ordered(fkeyScan, ForwardScanDirection)) != NULL)
+	{
+		Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tuple);
+		ForeignKeyInfo *fkinfo;
+		Datum		adatum;
+		bool		isNull;
+		ArrayType  *arr;
+		int			nelements;
+
+		/* skip if not a foreign key */
+		if (con->contype != CONSTRAINT_FOREIGN)
+			continue;
+
+		/* we're not interested unless the fkey has been validated */
+		if (!con->convalidated)
+			continue;
+
+		fkinfo = (ForeignKeyInfo *) palloc(sizeof(ForeignKeyInfo));
+		fkinfo->conindid = con->conindid;
+		fkinfo->confrelid = con->confrelid;
+		fkinfo->convalidated = con->convalidated;
+		fkinfo->conrelid = con->conrelid;
+		fkinfo->confupdtype = con->confupdtype;
+		fkinfo->confdeltype = con->confdeltype;
+		fkinfo->confmatchtype = con->confmatchtype;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "conkey is not a 1-D smallint array");
+
+		fkinfo->conkey = (int16 *) ARR_DATA_PTR(arr);
+		fkinfo->conncols = nelements;
+
+		adatum = heap_getattr(tuple, Anum_pg_constraint_confkey,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null confkey for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != INT2OID)
+			elog(ERROR, "confkey is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of confkey elements does not equal conkey elements");
+
+		fkinfo->confkey = (int16 *) ARR_DATA_PTR(arr);
+		adatum = heap_getattr(tuple, Anum_pg_constraint_conpfeqop,
+							RelationGetDescr(fkeyRel), &isNull);
+
+		if (isNull)
+			elog(ERROR, "null conpfeqop for constraint %u",
+				HeapTupleGetOid(tuple));
+
+		arr = DatumGetArrayTypeP(adatum);		/* ensure not toasted */
+		nelements = ARR_DIMS(arr)[0];
+
+		if (ARR_NDIM(arr) != 1 ||
+			nelements < 0 ||
+			ARR_HASNULL(arr) ||
+			ARR_ELEMTYPE(arr) != OIDOID)
+			elog(ERROR, "conpfeqop is not a 1-D smallint array");
+
+		/* sanity check */
+		if (nelements != fkinfo->conncols)
+			elog(ERROR, "number of conpfeqop elements does not equal conkey elements");
+
+		fkinfo->conpfeqop = (Oid *) ARR_DATA_PTR(arr);
+
+		fkinfos = lappend(fkinfos, fkinfo);
+	}
+
+	rel->fklist = fkinfos;
+	systable_endscan_ordered(fkeyScan);
+	index_close(fkeyRelIdx, AccessShareLock);
+	heap_close(fkeyRel, AccessShareLock);
+
 	/*
 	 * Allow a plugin to editorialize on the info we obtained from the
 	 * catalogs.  Actions might include altering the assumed relation size,
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8cfbea0..0be29e6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -115,6 +115,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->lateral_relids = NULL;
 	rel->lateral_referencers = NULL;
 	rel->indexlist = NIL;
+	rel->fklist = NIL;
 	rel->pages = 0;
 	rel->tuples = 0;
 	rel->allvisfrac = 0;
@@ -127,6 +128,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptKind reloptkind)
 	rel->baserestrictcost.startup = 0;
 	rel->baserestrictcost.per_tuple = 0;
 	rel->joininfo = NIL;
+	rel->removal_flags = PLAN_SUITABILITY_ALL_PURPOSE;
 	rel->has_eclass_joins = false;
 
 	/* Check type of rtable entry */
@@ -377,6 +379,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->lateral_relids = NULL;
 	joinrel->lateral_referencers = NULL;
 	joinrel->indexlist = NIL;
+	joinrel->fklist = NIL;
 	joinrel->pages = 0;
 	joinrel->tuples = 0;
 	joinrel->allvisfrac = 0;
@@ -389,6 +392,7 @@ build_join_rel(PlannerInfo *root,
 	joinrel->baserestrictcost.startup = 0;
 	joinrel->baserestrictcost.per_tuple = 0;
 	joinrel->joininfo = NIL;
+	joinrel->removal_flags = PLAN_SUITABILITY_ALL_PURPOSE;
 	joinrel->has_eclass_joins = false;
 
 	/*
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 818c2f6..115e398 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -916,6 +916,33 @@ get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 	ReleaseSysCache(tp);
 }
 
+/*
+ * get_attnotnull
+ *
+ *		Given the relation id and the attribute number,
+ *		return the "attnotnull" field from the attribute relation.
+ */
+bool
+get_attnotnull(Oid relid, AttrNumber attnum)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache2(ATTNUM,
+						 ObjectIdGetDatum(relid),
+						 Int16GetDatum(attnum));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_attribute att_tup = (Form_pg_attribute) GETSTRUCT(tp);
+		bool		result;
+
+		result = att_tup->attnotnull;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return false;
+}
+
 /*				---------- COLLATION CACHE ----------					 */
 
 /*
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 1a53f6c..e02bad6 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -182,6 +182,7 @@ extern void ExecBSTruncateTriggers(EState *estate,
 extern void ExecASTruncateTriggers(EState *estate,
 					   ResultRelInfo *relinfo);
 
+extern bool AfterTriggerQueueIsEmpty(void);
 extern void AfterTriggerBeginXact(void);
 extern void AfterTriggerBeginQuery(void);
 extern void AfterTriggerEndQuery(EState *estate);
diff --git a/src/include/executor/nodeAlternativePlan.h b/src/include/executor/nodeAlternativePlan.h
new file mode 100644
index 0000000..6d830c8
--- /dev/null
+++ b/src/include/executor/nodeAlternativePlan.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeAlternativePlan.h
+ *
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/executor/nodeAlternativePlan.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef NODEALTERNATIVEPLAN_H
+#define NODEALTERNATIVEPLAN_H
+
+#include "nodes/execnodes.h"
+
+extern PlanState *ExecInitAlternativePlan(AlternativePlan *node,
+						EState *estate, int eflags);
+/*
+ * Note that this node is only ever seen during initialization of a plan and
+ * it has no state type.
+ */
+#endif   /* NODEALTERNATIVEPLAN_H */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 38469ef..29f20dd 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -77,6 +77,7 @@ typedef enum NodeTag
 	T_SetOp,
 	T_LockRows,
 	T_Limit,
+	T_AlternativePlan,
 	/* these aren't subclasses of Plan: */
 	T_NestLoopParam,
 	T_PlanRowMark,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 50e9829..84b4e8d 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -752,6 +752,10 @@ typedef enum RTEKind
 	RTE_CTE						/* common table expr (WITH list element) */
 } RTEKind;
 
+/* Bit flags to mark suitability of plans */
+#define PLAN_SUITABILITY_ALL_PURPOSE		0
+#define PLAN_SUITABILITY_FK_TRIGGER_EMPTY	1
+
 typedef struct RangeTblEntry
 {
 	NodeTag		type;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21cbfa8..ea7b3da 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -72,6 +72,7 @@ typedef struct PlannedStmt
 
 	bool		hasRowSecurity; /* row security applied? */
 
+	int			suitableFor; /* under which conditions can this plan be used */
 } PlannedStmt;
 
 /* macro for fetching the Plan associated with a SubPlan node */
@@ -768,6 +769,20 @@ typedef struct LockRows
 	int			epqParam;		/* ID of Param for EvalPlanQual re-eval */
 } LockRows;
 
+
+/* ----------------
+ *		alternative plan node
+ *
+ * Stores a list of alternative plans and one
+ * all purpose plan.
+ * ----------------
+ */
+typedef struct AlternativePlan
+{
+	Plan		plan;
+	List	   *planList;
+} AlternativePlan;
+
 /* ----------------
  *		limit node
  *
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 334cf51..f0ab0ec 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -95,6 +95,8 @@ typedef struct PlannerGlobal
 
 	int			nParamExec;		/* number of PARAM_EXEC Params used */
 
+	int			suitableFor; /* under which conditions can this plan be used */
+
 	Index		lastPHId;		/* highest PlaceHolderVar ID assigned */
 
 	Index		lastRowMarkId;	/* highest PlanRowMark ID assigned */
@@ -359,6 +361,8 @@ typedef struct PlannerInfo
  *		lateral_referencers - relids of rels that reference this one laterally
  *		indexlist - list of IndexOptInfo nodes for relation's indexes
  *					(always NIL if it's not a table)
+ *		fklist - list of ForeignKeyInfo's for relation's foreign key
+ *					constraints. (always NIL if it's not a table)
  *		pages - number of disk pages in relation (zero if not a table)
  *		tuples - number of tuples in relation (not considering restrictions)
  *		allvisfrac - fraction of disk pages that are marked all-visible
@@ -452,6 +456,7 @@ typedef struct RelOptInfo
 	Relids		lateral_relids; /* minimum parameterization of rel */
 	Relids		lateral_referencers;	/* rels that reference me laterally */
 	List	   *indexlist;		/* list of IndexOptInfo */
+	List	   *fklist;			/* list of ForeignKeyInfo */
 	BlockNumber pages;			/* size estimates derived from pg_class */
 	double		tuples;
 	double		allvisfrac;
@@ -469,6 +474,8 @@ typedef struct RelOptInfo
 	QualCost	baserestrictcost;		/* cost of evaluating the above */
 	List	   *joininfo;		/* RestrictInfo structures for join clauses
 								 * involving this rel */
+	int			removal_flags;		/* it may be possible to not bother joining
+									 * this relation at all */
 	bool		has_eclass_joins;		/* T means joininfo is incomplete */
 } RelOptInfo;
 
@@ -542,6 +549,51 @@ typedef struct IndexOptInfo
 	bool		amhasgetbitmap; /* does AM have amgetbitmap interface? */
 } IndexOptInfo;
 
+/*
+ * ForeignKeyInfo
+ *		Used to store pg_constraint records for foreign key constraints for use
+ *		by the planner.
+ *
+ *		conindid - The index which supports the foreign key
+ *
+ *		confrelid - The relation that is referenced by this foreign key
+ *
+ *		convalidated - True if the foreign key has been validated.
+ *
+ *		conrelid - The Oid of the relation that the foreign key belongs to
+ *
+ *		confupdtype - ON UPDATE action for when the referenced table is updated
+ *
+ *		confdeltype - ON DELETE action, controls what to do when a record is
+ *					deleted from the referenced table.
+ *
+ *		confmatchtype - foreign key match type, e.g MATCH FULL, MATCH PARTIAL
+ *
+ *		conncols - Number of columns defined in the foreign key
+ *
+ *		conkey - An array of conncols elements to store the varattno of the
+ *					columns on the referencing side of the foreign key
+ *
+ *		confkey - An array of conncols elements to store the varattno of the
+ *					columns on the referenced side of the foreign key
+ *
+ *		conpfeqop - An array of conncols elements to store the operators for
+ *					PK = FK comparisons
+ */
+typedef struct ForeignKeyInfo
+{
+	Oid			conindid;		/* index supporting this constraint */
+	Oid			confrelid;		/* relation referenced by foreign key */
+	bool		convalidated;	/* constraint has been validated? */
+	Oid			conrelid;		/* relation this constraint constrains */
+	char		confupdtype;	/* foreign key's ON UPDATE action */
+	char		confdeltype;	/* foreign key's ON DELETE action */
+	char		confmatchtype;	/* foreign key's match type */
+	int			conncols;		/* number of columns references */
+	int16	   *conkey;			/* Columns of conrelid that the constraint applies to */
+	int16	   *confkey;		/* columns of confrelid that foreign key references */
+	Oid		   *conpfeqop;		/* Operator list for comparing PK to FK */
+} ForeignKeyInfo;
 
 /*
  * EquivalenceClasses
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 6cad92e..7b040fa 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -37,7 +37,8 @@ typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
 extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
 
-extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
+extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist,
+								int removal_flags);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
@@ -119,6 +120,8 @@ extern List *generate_join_implied_equalities(PlannerInfo *root,
 								 Relids join_relids,
 								 Relids outer_relids,
 								 RelOptInfo *inner_rel);
+extern Oid select_equality_operator(EquivalenceClass *ec, Oid lefttype,
+								 Oid righttype);
 extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
 extern void add_child_rel_equivalences(PlannerInfo *root,
 						   AppendRelInfo *appinfo,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fa72918..d71c553 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -27,8 +27,9 @@ typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
 /*
  * prototypes for plan/planmain.c
  */
-extern RelOptInfo *query_planner(PlannerInfo *root, List *tlist,
-			  query_pathkeys_callback qp_callback, void *qp_extra);
+extern List *query_planner(PlannerInfo *root, List *tlist,
+			  query_pathkeys_callback qp_callback, void *qp_extra,
+			  bool all_purpose_plan_only);
 
 /*
  * prototypes for plan/planagg.c
@@ -73,6 +74,7 @@ extern Group *make_group(PlannerInfo *root, List *tlist, List *qual,
 extern Plan *materialize_finished_plan(Plan *subplan);
 extern Unique *make_unique(Plan *lefttree, List *distinctList);
 extern LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int epqParam);
+extern AlternativePlan *make_alternativeplan(List *planlist);
 extern Limit *make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount,
 		   int64 offset_est, int64 count_est);
 extern SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan *lefttree,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 2f5ede1..14e64fc 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -68,6 +68,7 @@ extern Oid	get_atttype(Oid relid, AttrNumber attnum);
 extern int32 get_atttypmod(Oid relid, AttrNumber attnum);
 extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 					  Oid *typid, int32 *typmod, Oid *collid);
+extern bool get_attnotnull(Oid relid, AttrNumber attnum);
 extern char *get_collation_name(Oid colloid);
 extern char *get_constraint_name(Oid conoid);
 extern Oid	get_opclass_family(Oid opclass);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 57fc910..51b22b2 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -3301,6 +3301,171 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 (1 row)
 
 rollback;
+begin work;
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+ id | b_id 
+----+------
+  3 |    2
+(1 row)
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on a
+(1 row)
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (val = 10)
+(6 rows)
+
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+            QUERY PLAN            
+----------------------------------
+ Hash Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+               Filter: (id = val)
+(6 rows)
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+         QUERY PLAN         
+----------------------------
+ Hash Join
+   Hash Cond: (a.id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+          QUERY PLAN          
+------------------------------
+ Hash Right Join
+   Hash Cond: (a.b_id = b.id)
+   ->  Seq Scan on a
+   ->  Hash
+         ->  Seq Scan on b
+(5 rows)
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+        QUERY PLAN         
+---------------------------
+ Nested Loop
+   ->  Seq Scan on c
+   ->  Materialize
+         ->  Seq Scan on a
+(4 rows)
+
+set constraints b_c_id_fkey deferred;
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+explain (costs off)
+execute ab;
+  QUERY PLAN   
+---------------
+ Seq Scan on b
+(1 row)
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+explain (costs off)
+execute ab;
+          QUERY PLAN          
+------------------------------
+ Hash Join
+   Hash Cond: (b.c_id = c.id)
+   ->  Seq Scan on b
+   ->  Hash
+         ->  Seq Scan on c
+(5 rows)
+
+rollback;
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 06a27ea..e3a3314 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -990,6 +990,89 @@ select i8.* from int8_tbl i8 left join (select f1 from int4_tbl group by f1) i4
 
 rollback;
 
+begin work;
+
+create temp table c (
+  id int primary key
+);
+create temp table b (
+  id int primary key,
+  c_id int not null,
+  val int not null,
+  constraint b_c_id_fkey foreign key (c_id) references c deferrable
+);
+create temp table a (
+  id int primary key,
+  b_id int not null,
+  constraint a_b_id_fkey foreign key (b_id) references b deferrable
+);
+
+insert into c (id) values(1);
+insert into b (id,c_id,val) values(2,1,10);
+insert into a (id,b_id) values(3,2);
+
+-- this should remove inner join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id;
+
+-- this should remove inner join to b and c
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- Ensure all of the target entries have their proper aliases.
+select a.* from a inner join b on a.b_id = b.id inner join c on b.c_id = c.id;
+
+-- change order of tables in query, this should generate the same plan as above.
+explain (costs off)
+select a.* from c inner join b on c.id = b.c_id inner join a on a.b_id = b.id;
+
+-- inner join can't be removed due to b columns in the target list
+explain (costs off)
+select * from a inner join b on a.b_id = b.id;
+
+-- this should not remove inner join to b due to quals restricting results from b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = 10;
+
+-- this should not remove join to b
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id where b.val = b.id;
+
+-- this should not remove the join, no foreign key exists between a.id and b.id
+explain (costs off)
+select a.* from a inner join b on a.id = b.id;
+
+-- ensure a left joined rel can't remove an inner joined rel
+explain (costs off)
+select a.* from b left join a on b.id = a.b_id;
+
+-- Ensure we remove b, but don't try and remove c. c has no join condition.
+explain (costs off)
+select a.* from a inner join b on a.b_id = b.id cross join c;
+
+set constraints b_c_id_fkey deferred;
+
+-- join should be removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+prepare ab as select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+-- perform an update which will cause some pending fk triggers to be added
+update c set id = 2 where id=1;
+
+-- ensure inner join is no longer removed.
+explain (costs off)
+select b.* from b inner join c on b.c_id = c.id;
+
+explain (costs off)
+execute ab;
+
+rollback;
+
 create temp table parent (k int primary key, pd int);
 create temp table child (k int unique, cd int);
 insert into parent values (1, 10), (2, 20), (3, 30);

#51

Simon Riggs

simon@2ndQuadrant.com

almost 11 years ago

In reply to: David Rowley (#50)

Re: Removing INNER JOINs

On 16 March 2015 at 09:55, David Rowley <dgrowleyml@gmail.com> wrote:

I think it's probably possible to do this, but I think it would require
calling make_one_rel() with every combination of each possibly removable
relations included and not included in the join list. I'm thinking this
could end up a lot of work as the number of calls to make_one_rel() would be
N^2, where N is the number of relations that may be removable.

My line of thought was more along the lines of that the backup/all purpose
plan will only be used in very rare cases. Either when a fk has been
deferred or if the query is being executed from within a volatile function
which has been called by an UPDATE statement which has just modified the
table causing a foreign key trigger to be queued. I'm willing to bet someone
does this somewhere in the world, but the query that's run would also have
to have a removable join. (One of the regression tests I've added exercises
this)

For that reason I thought it was best to generate only 2 plans. One with
*all* possible removable rels removed, and a backup one with nothing removed
which will be executed if there's any FK triggers queued up.

Agreed, just 2 plans.

The two ways of doing this have a massively different look in the EXPLAIN
output. With the method the patch currently implements only 1 of the 2
alternative plans are seen by EXPLAIN, this is because I've coded
ExecInitAlternativePlan() to return the root node only 1 of the 2 plans. If
I had kept the AlternativePlan node around then the EXPLAIN output would
have 2 plans, both sitting under the AlternativePlan node.

I guess that is at least compatible with how we currently handle other
join elimination, so that is acceptable to me.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: David Rowley (#50)

Re: Removing INNER JOINs

David Rowley <dgrowleyml@gmail.com> writes:

I'm not worried about the cost of the decision at plan init time. I was
just confused about what Tom was recommending. I couldn't quite decide from
his email if he meant to keep the alternative plan node around so that the
executor must transition through it for each row, or to just choose the
proper plan at executor init and return the actual root of the selected
plan instead of returning the initialised AlternativePlan node (see
nodeAlternativePlan.c)

TBH, I don't like anything about this patch and would prefer to reject it
altogether. I think it's a disaster from a system structural standpoint,
will probably induce nasty bugs, and simply doesn't apply to enough
real-world queries to be worth either the pain of making it work or the
planning overhead it will add.

The structural damage could be reduced if you got rid of the far-too-cute
idea that you could cut the AlternativePlan node out of the plan tree at
executor startup time. Just have it remember which decision it made and
pass down all the calls to the appropriate child node. What you're doing
here violates the rule that planstate trees have a one-to-one relationship
to plan trees. EXPLAIN used to iterate over those trees in lockstep, and
there probably still is code that does similar things (in third-party
modules if not core), so I don't think we should abandon that principle.

Also, the patch seems rather schizophrenic about whether it's relying
on run-time decisions or not; in particular I see no use of the
PlannedStmt.suitableFor field, and no reason to have one if there's
to be an AlternativePlan node making the decision.

I'm just about certain that you can't generate the two alternative plans
simply by calling make_one_rel() twice. The planner has far too much
tendency to scribble on its input data structures for that to work
reliably. It's possible you could dodge bugs of that sort, and save
some cycles, if you did base relation planning only once and created the
alternatives only while working at the join-relation level. Even then
I think you'd need to do pushups like what GEQO does in order to repeat
join-level planning. (While I'm looking at that: what used to be a
reasonably clear API between query_planner and grouping_planner now seems
like a complete mess, and I'm quite certain you've created multiple bugs
in grouping_planner, because it would need to track more than one e.g.
current_pathkeys value for the subplans created for the different
final_rels. You can't just arbitrarily do some of those steps in one loop
and the others in a totally different loop.)

As far as the fundamental-bugs issue goes, I have no faith that "there are
no pending AFTER trigger events" is a sufficient condition for "all known
foreign-key constraints hold against whatever-random-snapshot-I'm-using";
and it's certainly not a *necessary* condition. (BTW the patch should be
doing its best to localize knowledge about that, rather than spreading the
assumption far and wide through comments and choices of flag/variable
names.) I think the best you can hope to say in that line is that if
there were no pending events at the time the snapshot was taken, it might
be all right. Maybe. But it's not hard to imagine the trigger queue
getting emptied while snapshots still exist that can see inconsistent
states that prevailed before the triggers fired. (Remember that a trigger
might restore consistency by applying additional changes, not just by
throwing an error.) STABLE functions are the pain point here since they
could execute queries with snapshots from quite some time back.

As for the cost issue, that's an awful lot of code you added to
remove_useless_joins(). I'm concerned about how much time it's going to
spend while failing to prove anything for most queries. For starters,
it looks to be O(N^2) in the number of relations, which the existing
join_is_removable code is not; and in general it looks like it will do
work in many more common cases than the existing code has to. I'm also
pretty unhappy about having get_relation_info() physically visiting the
pg_constraint catalog for every relation that's ever scanned by any query.
(You could probably teach the relcache to remember data about foreign-key
constraints, instead of doing it this way.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers