Merge Append Patch merged up to 85devel

Started by Gregory Starkover 16 years ago7 messages
#1Gregory Stark
stark@mit.edu
1 attachment(s)

Here's a copy of the merge-append patch that I sent months ago merged up to
head. I haven't really added any additional functionality since then.

Heikki suggested I separate the Append and MergeAppend nodes into two executor
nodes. I had that half done in my tree but looking it over it leads to a lot
of duplicated code and a strange effect that there's on Path node but two
Executor nodes which seems strange. I'm not sure which way to go here but at
least for now I'm leaving it this way since it's less code to write. If we
want it the other way to commit then I'll do it.

The other pending question is the same I had back when I originally submitted
it. I don't really understand what's going on with eclasses and what
invariants we're aiming to maintain with them. I don't see a problem tossing
all the child relation attributes into the same eclass even though they're not
strictly speaking "equivalent". No join above the append path is going to see
the child attributes anyways. But that might be shortsighted as I'm not really
sure what the consequences are and what other uses we have envisioned for
eclasses in the future.

Attachments:

merge-append-85-v1.difftext/x-diffDownload
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 0938e94..cf0f3a1 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -59,9 +59,25 @@
 
 #include "executor/execdebug.h"
 #include "executor/nodeAppend.h"
+#include "utils/lsyscache.h"
+#include "access/nbtree.h"
+
+/* It gets quite confusing having a heap array (indexed by integers) which
+ * contains integers which index into the slots array. These typedefs try to
+ * clear it up but without making simple inline accessing functions they don't
+ * actually produce any warnings on mistakes */
+
+typedef int SlotNumber;
+typedef int HeapPosition;
+
+#define WHICHPLAN_PLANS_UNINITIALIZED (-1)
 
 static bool exec_append_initialize_next(AppendState *appendstate);
 
+static int heap_compare_slots(AppendState *node, SlotNumber slot1, SlotNumber slot2);
+
+static void heap_siftup_slot(AppendState *node);
+static void heap_insert_slot(AppendState *node, SlotNumber new_slot);
 
 /* ----------------------------------------------------------------
  *		exec_append_initialize_next
@@ -177,12 +193,14 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 
 		appendstate->as_firstplan = tplan;
 		appendstate->as_lastplan = tplan;
+		appendstate->as_is_ordered = false;
 	}
 	else
 	{
 		/* normal case, scan all subplans */
 		appendstate->as_firstplan = 0;
 		appendstate->as_lastplan = nplans - 1;
+		appendstate->as_is_ordered = node->isOrdered;
 	}
 
 	/*
@@ -224,11 +242,50 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
 	ExecAssignResultTypeFromTL(&appendstate->ps);
 	appendstate->ps.ps_ProjInfo = NULL;
 
-	/*
-	 * return the result from the first subplan's initialization
-	 */
-	appendstate->as_whichplan = appendstate->as_firstplan;
-	exec_append_initialize_next(appendstate);
+	if (!appendstate->as_is_ordered) 
+	{
+		/*
+		 * return the result from the first subplan's initialization
+		 */
+		appendstate->as_whichplan = appendstate->as_firstplan;
+		exec_append_initialize_next(appendstate);
+	} else {
+		/* set up scan keys and initialize *all* the subnodes */
+		int i;
+
+		appendstate->as_nkeys = node->numCols;
+		appendstate->as_scankeys = palloc(sizeof(ScanKeyData) *  node->numCols);
+		appendstate->as_slots = palloc(sizeof(TupleTableSlot *) * nplans);
+		appendstate->as_heap = palloc(sizeof(int) * nplans);
+		appendstate->as_heap_size = 0;
+
+		for (i=0; i < nplans; i++)
+		{
+			appendstate->as_whichplan = i;
+			exec_append_initialize_next(appendstate);
+		}
+
+		appendstate->as_whichplan = WHICHPLAN_PLANS_UNINITIALIZED;
+
+		for (i=0; i < node->numCols; i++)
+		{
+			Oid sortFunction;
+			bool reverse;
+
+			get_compare_function_for_ordering_op(node->sortOperators[i],
+												 &sortFunction, &reverse);
+
+			ScanKeyInit(&appendstate->as_scankeys[i],
+						node->sortColIdx[i],
+						InvalidStrategy,
+						sortFunction,
+						(Datum)0);
+			if (reverse)
+				appendstate->as_scankeys[i].sk_flags |= SK_BT_DESC;
+			if (node->nullsFirst[i])
+				appendstate->as_scankeys[i].sk_flags |= SK_BT_NULLS_FIRST;
+		}
+	}
 
 	return appendstate;
 }
@@ -253,47 +310,168 @@ ExecCountSlotsAppend(Append *node)
 TupleTableSlot *
 ExecAppend(AppendState *node)
 {
-	for (;;)
-	{
-		PlanState  *subnode;
-		TupleTableSlot *result;
+	if (!node->as_is_ordered)
+		for (;;)
+		{
+			PlanState  *subnode;
+			TupleTableSlot *result;
 
-		/*
-		 * figure out which subplan we are currently processing
-		 */
-		subnode = node->appendplans[node->as_whichplan];
+			/*
+			 * figure out which subplan we are currently processing
+			 */
+			subnode = node->appendplans[node->as_whichplan];
 
-		/*
-		 * get a tuple from the subplan
-		 */
-		result = ExecProcNode(subnode);
+			/*
+			 * get a tuple from the subplan
+			 */
+			result = ExecProcNode(subnode);
+
+			if (!TupIsNull(result))
+			{
+				/*
+				 * If the subplan gave us something then return it as-is. We do
+				 * NOT make use of the result slot that was set up in
+				 * ExecInitAppend, first because there's no reason to and second
+				 * because it may have the wrong tuple descriptor in
+				 * inherited-UPDATE cases.
+				 */
+				return result;
+			}
 
-		if (!TupIsNull(result))
-		{
 			/*
-			 * If the subplan gave us something then return it as-is. We do
-			 * NOT make use of the result slot that was set up in
-			 * ExecInitAppend, first because there's no reason to and second
-			 * because it may have the wrong tuple descriptor in
-			 * inherited-UPDATE cases.
+			 * Go on to the "next" subplan in the appropriate direction. If no
+			 * more subplans, return the empty slot set up for us by
+			 * ExecInitAppend.
 			 */
-			return result;
+			if (ScanDirectionIsForward(node->ps.state->es_direction))
+				node->as_whichplan++;
+			else
+				node->as_whichplan--;
+			if (!exec_append_initialize_next(node))
+				return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+
+			/* Else loop back and try to get a tuple from the new subplan */
+		}
+	else
+	{
+		TupleTableSlot *result;
+		SlotNumber i;
+	
+		if (node->as_whichplan == WHICHPLAN_PLANS_UNINITIALIZED)
+		{
+			for (i=0; i<node->as_nplans; i++)
+			{
+				node->as_slots[i] = ExecProcNode(node->appendplans[i]);
+				if (!TupIsNull(node->as_slots[i]))
+					heap_insert_slot(node, i);
+			}
 		}
-
-		/*
-		 * Go on to the "next" subplan in the appropriate direction. If no
-		 * more subplans, return the empty slot set up for us by
-		 * ExecInitAppend.
-		 */
-		if (ScanDirectionIsForward(node->ps.state->es_direction))
-			node->as_whichplan++;
 		else
-			node->as_whichplan--;
-		if (!exec_append_initialize_next(node))
-			return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+		{
+			i = node->as_whichplan;
+			node->as_slots[i] = ExecProcNode(node->appendplans[i]);
+			if (TupIsNull(node->as_slots[i]))
+				;
+			else if (node->as_heap_size <= 0 || heap_compare_slots(node, node->as_heap[0], i) >= 0)
+				return node->as_slots[i];
+			else 
+				heap_insert_slot(node, i);
+		}
+		
+		if (node->as_heap_size > 0)
+		{
+			i = node->as_heap[0];
+			node->as_whichplan = i;
+			heap_siftup_slot(node);
+			result = node->as_slots[i];
+		} else {
+			result = ExecClearTuple(node->ps.ps_ResultTupleSlot);		
+		}
 
-		/* Else loop back and try to get a tuple from the new subplan */
+		return result;
+	}
+}
+
+static void
+heap_insert_slot(AppendState *node, SlotNumber new_slot)
+{
+	HeapPosition j;
+	SlotNumber *heap;
+	
+	Assert(!TupIsNull(node->as_slots[new_slot]));
+
+	j = node->as_heap_size++;
+	heap = node->as_heap;
+	
+	while (j>0)
+	{
+		int i = (j-1)/2;
+		if (heap_compare_slots(node, new_slot, node->as_heap[i]) >= 0)
+			break;
+		heap[j] = heap[i];
+		j=i;
+	}
+	heap[j] = new_slot;
+}
+
+static void
+heap_siftup_slot(AppendState *node)
+{
+	HeapPosition i=0, j, n;
+	
+	if (--node->as_heap_size <= 0)
+		return;
+	n = node->as_heap_size;
+
+	for (;;) {
+		j = 2 * i + 1;
+		if (j >= n)
+			break;
+		if (j+1 < n && heap_compare_slots(node, node->as_heap[j], node->as_heap[j+1]) > 0)
+			j++;
+		if (heap_compare_slots(node, node->as_heap[n], node->as_heap[j]) <= 0)
+			break;
+		node->as_heap[i] = node->as_heap[j];
+		i=j;
+	}
+	node->as_heap[i] = node->as_heap[n];
+}
+
+static int
+heap_compare_slots(AppendState *node, SlotNumber slot1, SlotNumber slot2)
+{
+	int nkey;
+	TupleTableSlot *s1, *s2;
+	
+	Assert(slot1 < node->as_nplans);
+	Assert(slot2 < node->as_nplans);
+
+	s1 = node->as_slots[slot1];
+	s2 = node->as_slots[slot2];
+	Assert(!TupIsNull(s1));
+	Assert(!TupIsNull(s2));
+
+	for (nkey = 0; nkey < node->as_nkeys; nkey++) 
+	{
+		ScanKey scankey = node->as_scankeys + nkey;
+		AttrNumber attno = scankey->sk_attno;
+		Datum d1, d2;
+		bool  n1, n2;
+		int compare;
+
+		d1 = slot_getattr(s1, attno, &n1);
+		d2 = slot_getattr(s2, attno, &n2);
+		
+		if (n1 && !n2)
+			return (scankey->sk_flags & SK_BT_NULLS_FIRST) ? -1 :  1;
+		if (!n1 && n2)
+			return (scankey->sk_flags & SK_BT_NULLS_FIRST) ?  1 : -1;
+		
+		compare = FunctionCall2(&scankey->sk_func, d1, d2);
+		if (compare != 0)
+			return (scankey->sk_flags & SK_BT_DESC) ? -compare : compare;
 	}
+	return 0;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 942ab46..2f24bd1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -50,7 +50,7 @@ static void set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					   RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 						Index rti, RangeTblEntry *rte);
-static void set_dummy_rel_pathlist(RelOptInfo *rel);
+static void set_dummy_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
 static void set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
 					  Index rti, RangeTblEntry *rte);
 static void set_function_pathlist(PlannerInfo *root, RelOptInfo *rel,
@@ -221,7 +221,7 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	if (rel->reloptkind == RELOPT_BASEREL &&
 		relation_excluded_by_constraints(root, rel, rte))
 	{
-		set_dummy_rel_pathlist(rel);
+		set_dummy_rel_pathlist(root, rel);
 		return;
 	}
 
@@ -266,6 +266,22 @@ set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
 	set_cheapest(rel);
 }
 
+
+/* It's possible that the child is itself an appendrel, in which case
+ * we can "cut out the middleman" and just add its child paths to our
+ * own list.  (We don't try to do this earlier because we need to
+ * apply both levels of transformation to the quals.)
+ */
+
+#define LAPPEND_PATH_FLATTEN_APPENDPATHS(subpaths_, childpath_)			\
+	do {																\
+		if (IsA((childpath_), AppendPath))								\
+			(subpaths_) = list_concat((subpaths_),						\
+									 ((AppendPath *) (childpath_))->subpaths); \
+		else															\
+			subpaths_ = lappend(subpaths_, childpath_);					\
+	} while (0)
+
 /*
  * set_append_rel_pathlist
  *	  Build access paths for an "append relation"
@@ -282,12 +298,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 						Index rti, RangeTblEntry *rte)
 {
 	int			parentRTindex = rti;
-	List	   *subpaths = NIL;
+	List	   *best_subpaths = NIL;	/* list of paths */
+	List	   *all_pathkeys = NIL;		/* union of all pathkeys in sub paths */
 	double		parent_rows;
 	double		parent_size;
 	double	   *parent_attrsizes;
 	int			nattrs;
-	ListCell   *l;
+	ListCell   *l, *ll, *lll;
 
 	/*
 	 * Initialize to compute size estimates for whole append relation.
@@ -354,7 +371,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 			 * appendrel.  Mark it with a dummy cheapest-path though, in case
 			 * best_appendrel_indexscan() looks at it later.
 			 */
-			set_dummy_rel_pathlist(childrel);
+			set_dummy_rel_pathlist(root, childrel);
 			continue;
 		}
 
@@ -370,7 +387,9 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 		 * We have to make child entries in the EquivalenceClass data
 		 * structures as well.
 		 */
+#ifdef FIXME
 		if (rel->has_eclass_joins)
+#endif
 		{
 			add_child_rel_equivalences(root, appinfo, rel, childrel);
 			childrel->has_eclass_joins = true;
@@ -386,21 +405,34 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 
 		/*
 		 * Compute the child's access paths, and add the cheapest one to the
-		 * Append path we are constructing for the parent.
-		 *
-		 * It's possible that the child is itself an appendrel, in which case
-		 * we can "cut out the middleman" and just add its child paths to our
-		 * own list.  (We don't try to do this earlier because we need to
-		 * apply both levels of transformation to the quals.)
+ 		 * list of cheap paths (best_paths) and the pathkeys of all the paths
+ 		 * to the union, all_pathkeys.
 		 */
 		set_rel_pathlist(root, childrel, childRTindex, childRTE);
 
 		childpath = childrel->cheapest_total_path;
-		if (IsA(childpath, AppendPath))
-			subpaths = list_concat(subpaths,
-								   ((AppendPath *) childpath)->subpaths);
-		else
-			subpaths = lappend(subpaths, childpath);
+ 		LAPPEND_PATH_FLATTEN_APPENDPATHS(best_subpaths, childpath);
+ 
+		/* gather up all the pathkeys of all sub-paths
+		 * FIXME -- this is O(n^2)!
+		 */
+		foreach(ll, childrel->pathlist)
+		{
+			ListCell *lll;
+			Path *childpath = (Path *) lfirst(ll);
+			bool found = false;
+			
+			foreach (lll, all_pathkeys)
+			{
+				List *existing_pathkeys = (List *)lfirst(lll);
+ 
+				if (compare_pathkeys(existing_pathkeys,
+									 childpath->pathkeys) == PATHKEYS_EQUAL)
+					found = true;
+			}
+			if (!found)
+				all_pathkeys = lappend(all_pathkeys, childpath->pathkeys);
+		}
 
 		/*
 		 * Accumulate size information from each child.
@@ -460,10 +492,74 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 * the parent rel.	(Note: this is correct even if we have zero or one
 	 * live subpath due to constraint exclusion.)
 	 */
-	add_path(rel, (Path *) create_append_path(rel, subpaths));
+	if (list_length(best_subpaths) == 1)
+	{
+		/* special case for a singleton append node, just grab the pathlist
+		 * from the child rels and use it directly.
+		 */
+		rel->pathlist = ((Path*) linitial(best_subpaths))->parent->pathlist;
+		set_cheapest(rel);
+		return;
+	}
 
-	/* Select cheapest path (pretty easy in this case...) */
-	set_cheapest(rel);
+	/* First add the unordered append path */
+	add_path(rel, (Path *) create_append_path(root, rel, best_subpaths, NIL));
+
+	/* Now try to add various ordered append paths */
+	foreach (ll, all_pathkeys)
+	{
+		List *pathkeys = (List *)lfirst(ll);
+		List *startup_subpaths = NIL; /* subpaths for minimal startup cost append path */
+		List *total_subpaths = NIL;   /* subpaths for minimal total cost append path */
+
+		/* populate both subpaths lists based on this pathkeys list */
+		foreach(lll, root->append_rel_list)
+		{
+			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lll);
+			RelOptInfo *childrel;
+			Path	   *childpath;
+
+			/* append_rel_list contains all append rels; ignore others */
+			if (appinfo->parent_relid != parentRTindex)
+				continue;
+
+			childrel = find_base_rel(root, appinfo->child_relid);
+			Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
+
+			/* First the cheapest startup cost */
+			childpath = get_cheapest_path_for_pathkeys(childrel->pathlist, 
+													   pathkeys,
+													   TOTAL_COST);
+
+			/* If we can't find any plans with the right order just add the
+			 * cheapest total plan to both paths, we'll have to sort it
+			 * anyways. Perhaps consider not generating a startup path for such
+			 * pathkeys since it'll be unlikely to be the cheapest startup
+			 * cost. */
+
+			if (!childpath) {
+				childpath = childrel->cheapest_total_path;
+				LAPPEND_PATH_FLATTEN_APPENDPATHS(startup_subpaths, childpath);
+				LAPPEND_PATH_FLATTEN_APPENDPATHS(total_subpaths, childpath);
+				continue;
+			}
+
+			LAPPEND_PATH_FLATTEN_APPENDPATHS(total_subpaths, childpath);
+
+			/* Also do the the cheapest startup cost */
+			childpath = get_cheapest_path_for_pathkeys(childrel->pathlist, 
+													   pathkeys,
+													   STARTUP_COST);
+			Assert(childpath); /* above special case ought to have caught this */
+			LAPPEND_PATH_FLATTEN_APPENDPATHS(startup_subpaths, childpath);
+		}
+
+		add_path(rel, (Path *) create_append_path(root, rel, startup_subpaths, pathkeys));
+		add_path(rel, (Path *) create_append_path(root, rel, total_subpaths, pathkeys));
+	}
+
+	/* Select cheapest path */
+  	set_cheapest(rel);
 }
 
 /*
@@ -474,13 +570,13 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  * AppendPath with no members (see also IS_DUMMY_PATH macro).
  */
 static void
-set_dummy_rel_pathlist(RelOptInfo *rel)
+set_dummy_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
 {
 	/* Set dummy size estimates --- we leave attr_widths[] as zeroes */
 	rel->rows = 0;
 	rel->width = 0;
 
-	add_path(rel, (Path *) create_append_path(rel, NIL));
+	add_path(rel, (Path *) create_append_path(root, rel, NIL, NIL));
 
 	/* Select cheapest path (pretty easy in this case...) */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index cc36cad..ca7db58 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1617,9 +1617,10 @@ add_child_rel_equivalences(PlannerInfo *root,
 		 * Won't generate joinclauses if const or single-member (the latter
 		 * test covers the volatile case too)
 		 */
+#if FIXME
 		if (cur_ec->ec_has_const || list_length(cur_ec->ec_members) <= 1)
 			continue;
-
+#endif
 		/* No point in searching if parent rel not mentioned in eclass */
 		if (!bms_is_subset(parent_rel->relids, cur_ec->ec_relids))
 			continue;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 8e5767d..d1bf767 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -939,7 +939,7 @@ best_appendrel_indexscan(PlannerInfo *root, RelOptInfo *rel,
 		return NULL;
 
 	/* Form and return the completed Append path. */
-	return (Path *) create_append_path(rel, append_paths);
+	return (Path *) create_append_path(root, rel, append_paths, NIL);
 }
 
 /*
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 1007cf0..0ff6f7d 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -943,7 +943,7 @@ mark_dummy_rel(RelOptInfo *rel)
 	rel->pathlist = NIL;
 
 	/* Set up the dummy path */
-	add_path(rel, (Path *) create_append_path(rel, NIL));
+	add_path(rel, (Path *) create_append_path(NULL, rel, NIL, NIL));
 
 	/* Set or update cheapest_total_path */
 	set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 762dfb1..c40e9ff 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -24,6 +24,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
+#include "optimizer/paths.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planmain.h"
 #include "optimizer/predtest.h"
@@ -570,11 +571,20 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path)
 	foreach(subpaths, best_path->subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(subpaths);
+		Plan	   *subplan;
 
-		subplans = lappend(subplans, create_plan(root, subpath));
+		subplan = create_plan(root, subpath);
+		
+		if (!pathkeys_contained_in(best_path->path.pathkeys, subpath->pathkeys))
+			subplan = (Plan *)make_sort_from_pathkeys(root, 
+													  subplan, 
+													  best_path->path.pathkeys, 
+													  -1.0);
+
+		subplans = lappend(subplans, subplan);
 	}
 
-	plan = make_append(subplans, false, tlist);
+	plan = make_append(root, subplans, false, tlist, best_path->path.pathkeys);
 
 	return (Plan *) plan;
 }
@@ -2542,17 +2552,20 @@ make_worktablescan(List *qptlist,
 }
 
 Append *
-make_append(List *appendplans, bool isTarget, List *tlist)
+make_append(PlannerInfo *root, List *appendplans, 
+			bool isTarget, List *tlist, List *pathkeys)
 {
 	Append	   *node = makeNode(Append);
 	Plan	   *plan = &node->plan;
 	double		total_size;
 	ListCell   *subnode;
-
+	
 	/*
-	 * Compute cost as sum of subplan costs.  We charge nothing extra for the
-	 * Append itself, which perhaps is too optimistic, but since it doesn't do
-	 * any selection or projection, it is a pretty cheap node.
+	 * Compute cost as sum of subplan costs. We charge nothing extra for a
+	 * plain Append itself, which perhaps is too optimistic, but since it
+	 * doesn't do any selection or projection, it is a pretty cheap node. In
+	 * the case of an ordered append we construct an equivalent bounded Sort
+	 * node and steal the cost calculations from it.
 	 */
 	plan->startup_cost = 0;
 	plan->total_cost = 0;
@@ -2562,8 +2575,10 @@ make_append(List *appendplans, bool isTarget, List *tlist)
 	{
 		Plan	   *subplan = (Plan *) lfirst(subnode);
 
-		if (subnode == list_head(appendplans))	/* first node? */
-			plan->startup_cost = subplan->startup_cost;
+ 		/* If it's ordered then the startup cost is the sum of the startup
+ 		 * costs, otherwise it's the startup cost of just the first plan */
+ 		if (pathkeys || subnode == list_head(appendplans))
+ 			plan->startup_cost += subplan->startup_cost;
 		plan->total_cost += subplan->total_cost;
 		plan->plan_rows += subplan->plan_rows;
 		total_size += subplan->plan_width * subplan->plan_rows;
@@ -2580,6 +2595,30 @@ make_append(List *appendplans, bool isTarget, List *tlist)
 	node->appendplans = appendplans;
 	node->isTarget = isTarget;
 
+	if (!pathkeys)
+		node->isOrdered = false;
+	else 
+	{
+		/* generate a throwaway sort node to find the sort functions */
+		Sort *tmp = make_sort_from_pathkeys(root, (Plan*)node, pathkeys, list_length(appendplans));
+		
+		node->isOrdered = true;
+		
+		node->numCols 		= tmp->numCols;
+		node->sortColIdx 	= tmp->sortColIdx;
+		node->sortOperators = tmp->sortOperators;
+		node->nullsFirst 	= tmp->nullsFirst;
+
+		/* a limited sort is the same kind of work (bounded heap sort) as an
+		 * ordered append with the bound set to the number of plans, so we just
+		 * use that to calculate the total cost. The startup cost is just the
+		 * sum of the startup costs of the nodes plus a bit to build the heap
+		 * (similar to processing that many rows in the bounded sort case).
+		 */
+		plan->total_cost = tmp->plan.total_cost - (enable_sort ? 0 : disable_cost);
+		plan->startup_cost += plan->total_cost / plan->plan_rows * list_length(appendplans);
+	}
+
 	return node;
 }
 
@@ -2937,9 +2976,10 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys,
 			{
 				EquivalenceMember *em = (EquivalenceMember *) lfirst(j);
 
+#ifdef FIXME
 				if (em->em_is_const || em->em_is_child)
 					continue;
-
+#endif
 				tle = tlist_member((Node *) em->em_expr, tlist);
 				if (tle)
 				{
@@ -2973,8 +3013,10 @@ make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys,
 					List	   *exprvars;
 					ListCell   *k;
 
+#ifdef FIXME
 					if (em->em_is_const || em->em_is_child)
 						continue;
+#endif
 					sortexpr = em->em_expr;
 					exprvars = pull_var_clause((Node *) sortexpr,
 											   PVC_INCLUDE_PLACEHOLDERS);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0a8d783..4053df3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -741,7 +741,7 @@ inheritance_planner(PlannerInfo *root)
 	if (list_length(subplans) == 1)
 		return (Plan *) linitial(subplans);
 
-	return (Plan *) make_append(subplans, true, tlist);
+	return (Plan *) make_append(root, subplans, true, tlist, NIL);
 }
 
 /*--------------------
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 5f9d9db..cac4064 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -448,7 +448,7 @@ generate_union_plan(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	plan = (Plan *) make_append(planlist, false, tlist);
+	plan = (Plan *) make_append(root, planlist, false, tlist, NIL);
 
 	/*
 	 * For UNION ALL, we just need the Append plan.  For UNION, need to add
@@ -539,7 +539,7 @@ generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root,
 	/*
 	 * Append the child results together.
 	 */
-	plan = (Plan *) make_append(planlist, false, tlist);
+	plan = (Plan *) make_append(root, planlist, false, tlist, NIL);
 
 	/* Identify the grouping semantics */
 	groupList = generate_setop_grouplist(op, tlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index b7c3d3c..6e237d2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -636,31 +636,74 @@ create_tidscan_path(PlannerInfo *root, RelOptInfo *rel, List *tidquals)
 /*
  * create_append_path
  *	  Creates a path corresponding to an Append plan, returning the
- *	  pathnode.
+ *	  pathnode. 
+ *
+ *    Note that we must support create_append_path(null, rel, nil, nil) which
+ *    is used to create dummy plans.
  */
+#define LOG2(x)  (log(x) / 0.693147180559945)
+
 AppendPath *
-create_append_path(RelOptInfo *rel, List *subpaths)
+create_append_path(PlannerInfo *root, RelOptInfo *rel, List *subpaths, List *pathkeys)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
 	ListCell   *l;
+	Cost total_cost=0, startup_cost=0;
+	Cost this_total_cost, this_startup_cost;
+	int npaths = 0;
 
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
-	pathnode->path.pathkeys = NIL;		/* result is always considered
-										 * unsorted */
+	pathnode->path.pathkeys = pathkeys;
 	pathnode->subpaths = subpaths;
 
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
+		if (pathkeys_contained_in(pathkeys, subpath->pathkeys))  
+		{
+			this_startup_cost = subpath->startup_cost;
+			this_total_cost   = subpath->total_cost;
+		} else {
+			Path sort_path; /* dummy for result of cost_sort */
+			cost_sort(&sort_path,
+					  root,
+					  pathkeys,
+					  subpath->total_cost, 
+					  subpath->parent->tuples, 
+					  subpath->parent->width, 
+					  -1.0);
+			this_total_cost   = sort_path.total_cost;
+			this_startup_cost = sort_path.startup_cost;
+		}
+		
+		npaths++;
+		total_cost += this_total_cost;
+		/* If it's unsorted the startup cost is just the first subpath's
+		 * startup cost, otherwise it's the sum of all startup costs */
+		if (pathkeys != NIL || l == list_head(subpaths))
+			startup_cost += this_startup_cost;
+	}
+
+	/* The cost of merging using a heapsort */
+	if (pathkeys != NIL)
+	{
+		Path sort_path;
+		cost_sort(&sort_path,
+				  root,
+				  pathkeys,
+				  total_cost,
+				  rel->rows,
+				  rel->width, 
+				  npaths);
+		total_cost = sort_path.total_cost - (enable_sort ? 0 : disable_cost);
+		startup_cost += total_cost / rel->rows * npaths;
 	}
 
+	pathnode->path.total_cost = total_cost;
+	pathnode->path.startup_cost = startup_cost;
+	
 	return pathnode;
 }
 
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4b16723..ef42ce6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -994,6 +994,13 @@ typedef struct AppendState
 	int			as_whichplan;
 	int			as_firstplan;
 	int			as_lastplan;
+
+	bool		as_is_ordered;
+	int			as_nkeys;
+	ScanKey		as_scankeys;		/* array of length as_nkeys */
+	TupleTableSlot **as_slots;      /* array of length as_nplans */
+	int			*as_heap;			/* array of length as_nplans */
+	int			as_heap_size;       /* how many slots are present in the heap */
 } AppendState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 23a5117..acc399e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -180,6 +180,14 @@ typedef struct Append
 	Plan		plan;
 	List	   *appendplans;
 	bool		isTarget;
+	bool		isOrdered;
+	
+	/* used only if the append is an ordered append */
+
+	int			numCols;		/* number of sort-key columns */
+	AttrNumber *sortColIdx;		/* their indexes in the target list */
+	Oid		   *sortOperators;	/* OIDs of operators to sort them by */
+	bool	   *nullsFirst;		/* NULLS FIRST/LAST directions */
 } Append;
 
 /* ----------------
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 0f4c52e..4e167ec 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -46,7 +46,10 @@ extern BitmapOrPath *create_bitmap_or_path(PlannerInfo *root,
 					  List *bitmapquals);
 extern TidPath *create_tidscan_path(PlannerInfo *root, RelOptInfo *rel,
 					List *tidquals);
-extern AppendPath *create_append_path(RelOptInfo *rel, List *subpaths);
+extern AppendPath *create_append_path(PlannerInfo *root,
+									  RelOptInfo *rel, 
+									  List *subpaths,
+									  List *pathkeys);
 extern ResultPath *create_result_path(List *quals);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0dd2bcc..1622810 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -41,7 +41,8 @@ extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
 extern Plan *create_plan(PlannerInfo *root, Path *best_path);
 extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual,
 				  Index scanrelid, Plan *subplan, List *subrtable);
-extern Append *make_append(List *appendplans, bool isTarget, List *tlist);
+extern Append *make_append(PlannerInfo *root, List *appendplans, 
+						   bool isTarget, List *tlist, List *pathkeys);
 extern RecursiveUnion *make_recursive_union(List *tlist,
 					 Plan *lefttree, Plan *righttree, int wtParam,
 					 List *distinctList, long numGroups);
#2Robert Haas
robertmhaas@gmail.com
In reply to: Gregory Stark (#1)
Re: Merge Append Patch merged up to 85devel

On Jul 5, 2009, at 10:02 AM, Gregory Stark <stark@mit.edu> wrote:

Here's a copy of the merge-append patch that I sent months ago
merged up to
head. I haven't really added any additional functionality since then.

Heikki suggested I separate the Append and MergeAppend nodes into
two executor
nodes. I had that half done in my tree but looking it over it leads
to a lot
of duplicated code and a strange effect that there's on Path node
but two
Executor nodes which seems strange. I'm not sure which way to go
here but at
least for now I'm leaving it this way since it's less code to write.
If we
want it the other way to commit then I'll do it.

The other pending question is the same I had back when I originally
submitted
it. I don't really understand what's going on with eclasses and what
invariants we're aiming to maintain with them. I don't see a problem
tossing
all the child relation attributes into the same eclass even though
they're not
strictly speaking "equivalent". No join above the append path is
going to see
the child attributes anyways. But that might be shortsighted as I'm
not really
sure what the consequences are and what other uses we have
envisioned for
eclasses in the future.

Can you provide some more details about the objective of this patch?
Or a link to previous discussion?

Thanks,

...Robert

#3Greg Stark
stark@mit.edu
In reply to: Robert Haas (#2)
Re: Merge Append Patch merged up to 85devel

On Sun, Jul 5, 2009 at 10:34 PM, Robert Haas<robertmhaas@gmail.com> wrote:

On Jul 5, 2009, at 10:02 AM, Gregory Stark <stark@mit.edu> wrote:

Here's a copy of the merge-append patch that I sent months ago merged up to
head. I haven't really added any additional functionality since then.

Can you provide some more details about the objective of this patch?  Or a
link to previous discussion?

It's one piece of the puzzle of how to deal with partitioned tables
more completely.

The basic problem is that currently the planner assumes all Append
nodes produce unordered output. That means there's no way for the
planner to use any indexes on partitions to produce a desired
ordering. That means it's impossible to do a merge join or to satisfy
an ORDER BY clause without introducing a sort even if you have
matching indexes on every partition.

Lots of common cases arise with partitioned tables where this kicks
in. Many of them will be eliminated as our partitioning support gets
more intelligent and is able to recognize how the partitioning key
relates to the requested order or collapse singleton partition scans
in cases where the parent table currently interferes. However there
will always be cases where those mechanisms to simplify the plan
sufficiently and we end up with an Append node of unordered partitions
and we need this as a back-stop to avoid awful plans.

The merge-append node works by teaching the Append node what sort
order it's aiming to produce. The append node then keeps a heap of
slots and returns the tuple from the top child plan replacing it in
the heap with the next plan.

This produces plans like this:

QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.20..489.00 rows=9600 width=4)
-> Append (cost=0.20..489.00 rows=9600 width=4)
-> Index Scan using p_pkey on p (cost=0.00..80.25 rows=2400 width=4)
-> Index Scan using p1_pkey on p1 p (cost=0.00..80.25
rows=2400 width=4)
-> Index Scan using p2_pkey on p2 p (cost=0.00..80.25
rows=2400 width=4)
-> Index Scan using p3_pkey on p3 p (cost=0.00..80.25
rows=2400 width=4)
(6 rows)

Instead of plans like this which is the best we can do today:

QUERY PLAN
--------------------------------------------------------------------------
Sort (cost=770.98..794.98 rows=9600 width=4)
Sort Key: public.p.i
-> Result (cost=0.00..136.00 rows=9600 width=4)
-> Append (cost=0.00..136.00 rows=9600 width=4)
-> Seq Scan on p (cost=0.00..34.00 rows=2400 width=4)
-> Seq Scan on p1 p (cost=0.00..34.00 rows=2400 width=4)
-> Seq Scan on p2 p (cost=0.00..34.00 rows=2400 width=4)
-> Seq Scan on p3 p (cost=0.00..34.00 rows=2400 width=4)
(8 rows)

--
greg
http://mit.edu/~gsstark/resume.pdf

#4Teodor Sigaev
teodor@sigaev.ru
In reply to: Robert Haas (#2)
Re: Merge Append Patch merged up to 85devel

Can you provide some more details about the objective of this patch? Or
a link to previous discussion?

Suppose, Greg's patch could be modified to support order OR index scans:
SELECT ... WHERE (c > 10 AND c < 20) OR (c > 100 AND C < 110) ORDER BY c DESC

with plan:
Result
-> Append
-> Backward index scan (c > 100 AND C < 110)
-> Backward index scan (c > 10 AND C < 20)

I suggested a similar patch two years ago:
http://archives.postgresql.org/message-id/45742C51.9020602@sigaev.ru (4-th
point) and subsequent discussion
http://archives.postgresql.org/pgsql-patches/2006-12/msg00029.php

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#5Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Greg Stark (#3)
1 attachment(s)
Re: Merge Append Patch merged up to 85devel

On Sun, Jul 5, 2009 at 7:23 PM, Greg Stark<stark@mit.edu> wrote:

Here's a copy of the merge-append patch that I sent months ago merged up to
head. I haven't really added any additional functionality since then.

Can you provide some more details about the objective of this patch?  Or a
link to previous discussion?

i was trying to test this one but i can't find a query that produces a
diferent plan than in 8.4.0, attached my current test just in case...
what kind of query is this intended to help?

something, maybe style dependant, that i don't like is the definition
of LAPPEND_PATH_FLATTEN_APPENDPATHS macro at some point in the middle
of the file i prefer they be defined at the top (just my
preference)... or there is a reason for doing it there?

another thing a don't like is those #ifdef FIXME surrounding already
existing if, why are those? and if they need to be fixed why there
isn't a comment explaining what the fix is or what it should behave?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Attachments:

test-mergeappend.sqltext/x-sql; charset=US-ASCII; name=test-mergeappend.sqlDownload
#6Greg Stark
stark@mit.edu
In reply to: Jaime Casanova (#5)
Re: Merge Append Patch merged up to 85devel

On Sat, Jul 25, 2009 at 8:12 PM, Jaime
Casanova<jcasanov@systemguards.com.ec> wrote:

i was trying to test this one but i can't find a query that produces a
diferent plan than in 8.4.0, attached my current test just in case...
what kind of query is this intended to help?

You may have to disable enable_seqscan to get simple examples like this to work:

select * from partitioned_table order by indexed_column;

more complex examples would trigger it naturally such as:

select * from partitioned_table where active order by indexed_column
(with an index on indexed_column where active)

or

select * from partitioned_table where indexed_column between x and y
order by indexed_column

something, maybe style dependant, that i don't like is the definition
of  LAPPEND_PATH_FLATTEN_APPENDPATHS macro at some point in the middle
of the file i prefer they be defined at the top (just my
preference)... or there is a reason for doing it there?

well it's only used in that one function, it's just some code which is
repeated three times and would obscure what's going on if it were
inlined.

another thing a don't like is those #ifdef FIXME surrounding already
existing if, why are those? and if they need to be fixed why there
isn't a comment explaining what the fix is or what it should behave?

Yeah, if I knew how to fix them then this patch wouldn't be stuck
waiting for feedback... :(

--
greg
http://mit.edu/~gsstark/resume.pdf

#7Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Greg Stark (#6)
3 attachment(s)
Re: Merge Append Patch merged up to 85devel

On Sat, Jul 25, 2009 at 2:26 PM, Greg Stark<stark@mit.edu> wrote:

more complex examples would trigger it naturally such as:

select * from partitioned_table where active order by indexed_column
(with an index on indexed_column where active)

or

select * from partitioned_table where indexed_column between x and y
order by indexed_column

look at the example, i had three OR'ed conditions on col1 wich is
indexed (in the script those where commented but i tried it first) and
i get the same plan than in 8.4

but you're right with "enable_seqscan to off" i get a better plan,
attaching explain analyze in 8.4 and 8.5 (with seqscan to on and off)

another thing a don't like is those #ifdef FIXME surrounding already
existing if, why are those? and if they need to be fixed why there
isn't a comment explaining what the fix is or what it should behave?

Yeah, if I knew how to fix them then this patch wouldn't be stuck
waiting for feedback... :(

and what's the problem with those if? as someone says before feel free
to speak slowly and draw pictures ;)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Attachments:

explain_analyze_84.out.gzapplication/x-gzip; name=explain_analyze_84.out.gzDownload
explain_analyze_85dev_seqscan_off.out.gzapplication/x-gzip; name=explain_analyze_85dev_seqscan_off.out.gzDownload
explain_analyze_85dev_seqscan_on.out.gzapplication/x-gzip; name=explain_analyze_85dev_seqscan_on.out.gzDownload