GROUP BY before JOIN

Started by David Rowleyover 10 years ago4 messages

david.rowley@2ndquadrant.com

over 10 years ago

1 attachment(s)

== Overview ==

As of today we always perform GROUP BY at the final stage, after each
relation in the query has been joined. This of course works, but it's not
always the most optimal way of executing the query.

Consider the following two relations:

create table product (product_id int primary key, code varchar(32) not
null);
create table sale (sale_id int primary key, product_id int not null, qty
int not null);
create index on sale (product_id);

Populated with:

insert into product select x.x,'ABC' || x.x from generate_series(1,100)
x(x);
insert into sale select x.x,x.x%100+1, 10 from generate_series(1,1000000)
x(x);

Here we have a table with 100 products and another with 1 million sales
records.

If we wanted to see the total sales for each product we may write a query
such as:

select s.product_id,sum(qty) as qty from sale s inner join product p on
s.product_id = p.product_id group by s.product_id;

If we look at the explain for this query we can see that the grouping
happened after the join took place:

QUERY PLAN
--------------------------------------------------
HashAggregate
Group Key: s.product_id
-> Hash Join
Hash Cond: (s.product_id = p.product_id)
-> Seq Scan on sale s
-> Hash
-> Seq Scan on product p

The join here would product exactly 1 million rows, then hashaggregate will
group 1 million rows.

If it were possible for PostgreSQL to perform the GROUP BY before the join
we could change that to Grouping 1 million rows, then joining 100 rows.

Of course we could have written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from sale
group by product_id) s inner join product p on s.product_id = p.product_id;

Which would do exactly that, and in this particular case it is much faster,
but it's not all rainbows and unicorns, as if the join happened to filter
out many of the groups, then it might not be a win at all.

Consider:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id where p.code = 'ABC1' group by s.product_id;

Since the product.code has no equivalence member in the sale table the
predicate cannot be applied to the sale table, so in this case if we'd
written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from sale
group by product_id) s inner join product p on s.product_id = p.product_id
where p.code = 'ABC1';

We'd have thrown away 99% of the groups created in the subquery.

== Proposal ==

I've been working on allowing the planner to properly cost which option is
better and choose to perform the GROUP BY at estimated most efficient join
level.

So far I've not gotten as far as supporting queries which contain aggregate
functions, as I need the ability to combine aggregate states (Simon and I
proposed a patch for that here https://commitfest.postgresql.org/5/131/).

Performance of the above test case is looking quite good so far:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id group by s.product_id;

-- Master = 312.294 ms
-- Patched = 95.328 ms

So a 327% performance increase in this particular case.

== Implementation ==

I've had to make some choices about how exactly to implement this feature.
As I explain above, we can't just always do the grouping at the first
possible opportunity due to the possibility of joins eliminating groups and
needless groups being created.

To support this in the planner I've invented a GroupingPath type and I've
also added 'cheapest_sorted_group_path' and 'cheapest_group_path' to
RelOptInfo. This part I wasn't too sure about and I wondered if these
GroupingPaths should just be added to the normal list with add_path(), but
since these paths perform more work than normal paths that would give them
an unfair disadvantage and they could be thrown out. These paths are only
possibly cheaper after subsequent JOIN has taken place.

There's also an issue with row estimates being wrong on joinrels, and the
row estimate is using the base rel's row count rather than the number of
groups. As yet I'm unsure the best way to fix this.

I wanted to post this early so as to register the fact that I'm working on
this, but I'm also posting in hope to get some early feedback on what I
have so far.

Of course there's lots more work to do here, aggregates need to be
supported, functional dependencies and transitive closures will also need
to be detected in a more complete implementation.

Here's what I have so far (attached)

Comments are most welcome. Thanks.

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

groupbeforejoin_v0.1.patchapplication/octet-stream; name=groupbeforejoin_v0.1.patchDownload

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 81725d6..d3f4ea4 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -1615,6 +1615,21 @@ _outIndexPath(StringInfo str, const IndexPath *node)
 }
 
 static void
+_outGroupingPath(StringInfo str, const GroupingPath *node)
+{
+	WRITE_NODE_TYPE("GROUPINGPATH");
+
+	_outPathInfo(str, (const Path *) node);
+
+	WRITE_NODE_FIELD(subpath);
+	WRITE_ENUM_FIELD(aggstrategy, AggStrategy);
+	WRITE_NODE_FIELD(group_clause);
+	WRITE_NODE_FIELD(group_exprs);
+	WRITE_BOOL_FIELD(combine_states);
+	WRITE_BOOL_FIELD(finalize_aggs);
+}
+
+static void
 _outBitmapHeapPath(StringInfo str, const BitmapHeapPath *node)
 {
 	WRITE_NODE_TYPE("BITMAPHEAPPATH");
@@ -3236,6 +3251,9 @@ _outNode(StringInfo str, const void *obj)
 			case T_IndexPath:
 				_outIndexPath(str, obj);
 				break;
+			case T_GroupingPath:
+				_outGroupingPath(str, obj);
+				break;
 			case T_BitmapHeapPath:
 				_outBitmapHeapPath(str, obj);
 				break;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8fc1cfd..727f49f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -177,6 +177,58 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 }
 
 /*
+ * build_grouping_paths
+ *
+ * When tryGroupBeforeJoin is set to true we make attempts to perform
+ * grouping before the final plan is generated. In many cases, when the
+ * group by clause contains a true sub-set of all joined relations this
+ * will be a big win, but there's also cases where it's not.
+ * If the join was to filter out a large portion of the would be groups,
+ * then there's less chance of this being useful, whereas if the join
+ * does not filter, or the joining rel's restrictinfo does not filter many
+ * of the groups, then grouping before joining may be a big win if the
+ * grouping reduces the number of tuples that need to be joined.
+ *
+ * Here we're only able to attempt a grouping if all of the relations that
+ * are required by the grouping clause are joined already.
+ */
+void
+build_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+	if (enable_earlygrouping == true &&
+		root->tryGroupBeforeJoin &&
+		bms_is_subset(root->minimumGroupingRels, rel->relids))
+	{
+		if (root->parse->groupClause)
+		{
+			rel->cheapest_sorted_group_path = create_grouping_path(root,
+				rel, AGG_SORTED, root->parse->groupClause, false, false);
+
+			rel->cheapest_group_path = create_grouping_path(root, rel,
+				AGG_HASHED, root->parse->groupClause, false, false);
+
+			/*
+			 * if we managed to generate a path for each, then we'll set the
+			 * cheapest_group_path to the lowest cost path.
+			 */
+			if (rel->cheapest_sorted_group_path != NULL &&
+				rel->cheapest_group_path != NULL)
+			{
+				if (compare_path_costs(&rel->cheapest_sorted_group_path->path,
+									   &rel->cheapest_group_path->path,
+									   TOTAL_COST) < 0)
+					rel->cheapest_group_path = rel->cheapest_sorted_group_path;
+			}
+		}
+		else if (root->parse->hasAggs)
+		{
+			rel->cheapest_group_path = create_grouping_path(root, rel,
+				AGG_PLAIN, NIL, false, false);
+		}
+	}
+}
+
+/*
  * set_base_rel_consider_startup
  *	  Set the consider_[param_]startup flags for each base-relation entry.
  *
@@ -436,6 +488,9 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 
+	/* Attempt to build some early grouping paths */
+	build_grouping_paths(root, rel);
+
 #ifdef OPTIMIZER_DEBUG
 	debug_print_rel(root, rel);
 #endif
@@ -1819,6 +1874,9 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
 
+			/* attempt to build some early grouping paths */
+			build_grouping_paths(root, rel);
+
 #ifdef OPTIMIZER_DEBUG
 			debug_print_rel(root, rel);
 #endif
@@ -2555,6 +2613,9 @@ print_path(PlannerInfo *root, Path *path, int indent)
 		case T_IndexPath:
 			ptype = "IdxScan";
 			break;
+		case T_GroupingPath:
+			ptype = "GroupingPath";
+			break;
 		case T_BitmapHeapPath:
 			ptype = "BitmapHeapScan";
 			break;
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 7069f60..7eb54e7 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -118,6 +118,7 @@ bool		enable_nestloop = true;
 bool		enable_material = true;
 bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
+bool		enable_earlygrouping = true;
 
 typedef struct
 {
@@ -1639,7 +1640,8 @@ cost_agg(Path *path, PlannerInfo *root,
 	else
 	{
 		/* must be AGG_HASHED */
-		startup_cost = input_total_cost;
+		startup_cost = input_startup_cost;
+		startup_cost += input_total_cost;
 		startup_cost += aggcosts->transCost.startup;
 		startup_cost += aggcosts->transCost.per_tuple * input_tuples;
 		startup_cost += (cpu_operator_cost * numGroupCols) * input_tuples;
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index ba78252..47c2191 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -701,6 +701,20 @@ sort_inner_and_outer(PlannerInfo *root,
 						   innerkeys,
 						   jointype,
 						   extra);
+
+		if (outerrel->cheapest_sorted_group_path != NULL)
+		{
+			try_mergejoin_path(root,
+							   joinrel,
+							   &outerrel->cheapest_sorted_group_path->path,
+							   inner_path,
+							   merge_pathkeys,
+							   cur_mergeclauses,
+							   outerkeys,
+							   innerkeys,
+							   jointype,
+							   extra);
+		}
 	}
 }
 
@@ -850,6 +864,18 @@ match_unsorted_outer(PlannerInfo *root,
 			Assert(outerpath);
 		}
 
+		/* try early grouping paths, if any */
+		if (innerrel->cheapest_group_path != NULL)
+		{
+			try_nestloop_path(root,
+							  joinrel,
+							  outerpath,
+							  &innerrel->cheapest_group_path->path,
+							  innerrel->cheapest_group_path->path.pathkeys,
+							  jointype,
+							  extra);
+		}
+
 		/*
 		 * The result will have this sort order (even if it is implemented as
 		 * a nestloop, and even if some of the mergeclauses are implemented by
@@ -1251,6 +1277,16 @@ hash_inner_and_outer(PlannerInfo *root,
 								  jointype,
 								  extra);
 
+			if (outerrel->cheapest_group_path != NULL)
+				try_hashjoin_path(root,
+								  joinrel,
+								  &outerrel->cheapest_group_path->path,
+								  cheapest_total_inner,
+								  hashclauses,
+								  jointype,
+								  extra);
+
+
 			foreach(lc1, outerrel->cheapest_parameterized_paths)
 			{
 				Path	   *outerpath = (Path *) lfirst(lc1);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f461586..8e31612 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -56,6 +56,7 @@ static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_p
 static Result *create_result_plan(PlannerInfo *root, ResultPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path);
+static Plan *create_grouping_plan(PlannerInfo *root, GroupingPath *best_path);
 static SeqScan *create_seqscan_plan(PlannerInfo *root, Path *best_path,
 					List *tlist, List *scan_clauses);
 static SampleScan *create_samplescan_plan(PlannerInfo *root, Path *best_path,
@@ -273,6 +274,10 @@ create_plan_recurse(PlannerInfo *root, Path *best_path)
 			plan = create_unique_plan(root,
 									  (UniquePath *) best_path);
 			break;
+		case T_Grouping:
+			plan = create_grouping_plan(root,
+										(GroupingPath *) best_path);
+			break;
 		default:
 			elog(ERROR, "unrecognized node type: %d",
 				 (int) best_path->pathtype);
@@ -1101,6 +1106,141 @@ create_unique_plan(PlannerInfo *root, UniquePath *best_path)
 	return plan;
 }
 
+/*
+ * create_grouping_plan
+ *	  Create a Grouping plan for 'best_path' and (recursively) plans
+ *	  for its subpaths.
+ *
+ *	  Returns a Plan node.
+ */
+static Plan *
+create_grouping_plan(PlannerInfo *root, GroupingPath *best_path)
+{
+	Plan	   *plan;
+	Plan	   *subplan;
+	List	   *group_exprs;
+	List	   *newtlist;
+	int			nextresno;
+	bool		newitems;
+	int			numGroupCols;
+	AttrNumber *groupColIdx;
+	int			groupColPos;
+	ListCell   *l;
+
+	subplan = create_plan_recurse(root, best_path->subpath);
+
+	group_exprs = best_path->group_exprs;
+
+	newtlist = build_path_tlist(root, &best_path->path);
+	nextresno = list_length(newtlist) + 1;
+	newitems = false;
+
+	foreach(l, group_exprs)
+	{
+		Node	   *groupexpr = lfirst(l);
+		TargetEntry *tle;
+
+		tle = tlist_member(groupexpr, newtlist);
+		if (!tle)
+		{
+			tle = makeTargetEntry((Expr *) groupexpr,
+				nextresno,
+				NULL,
+				false);
+			newtlist = lappend(newtlist, tle);
+			nextresno++;
+			newitems = true;
+		}
+	}
+
+	if (newitems || best_path->aggstrategy == AGG_SORTED)
+	{
+		/*
+		 * If the top plan node can't do projections and its existing target
+		 * list isn't already what we need, we need to add a Result node to
+		 * help it along.
+		 */
+		if (!is_projection_capable_plan(subplan) &&
+			!tlist_same_exprs(newtlist, subplan->targetlist))
+			subplan = (Plan *) make_result(root, newtlist, NULL, subplan);
+		else
+			subplan->targetlist = newtlist;
+	}
+
+
+	/*
+	 * Build control information showing which subplan output columns are to
+	 * be examined by the grouping step.  Unfortunately we can't merge this
+	 * with the previous loop, since we didn't then know which version of the
+	 * subplan tlist we'd end up using.
+	 */
+	newtlist = subplan->targetlist;
+	numGroupCols = list_length(group_exprs);
+	groupColIdx = (AttrNumber *) palloc(numGroupCols * sizeof(AttrNumber));
+
+	groupColPos = 0;
+	foreach(l, group_exprs)
+	{
+		Node	   *groupexpr = lfirst(l);
+		TargetEntry *tle;
+
+		tle = tlist_member(groupexpr, newtlist);
+		if (!tle)				/* shouldn't happen */
+			elog(ERROR, "failed to find group expression in subplan tlist");
+		groupColIdx[groupColPos++] = tle->resno;
+	}
+
+	if (best_path->aggstrategy == AGG_HASHED)
+	{
+		long		numGroups;
+
+		numGroups = (long) Min(best_path->path.rows, (double) LONG_MAX);
+
+		plan = (Plan *) make_agg(root,
+								 build_path_tlist(root, &best_path->path),
+								 NIL,
+								 AGG_HASHED,
+								 NULL,
+								 numGroupCols,
+								 groupColIdx,
+								 extract_grouping_ops(best_path->group_clause),
+								 NIL,
+								 numGroups,
+								 subplan);
+	}
+	else
+	{
+		long		numGroups;
+		AggClauseCosts agg_costs;
+
+		MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+
+		numGroups = (long) Min(best_path->path.rows, (double) LONG_MAX);
+
+		if (!pathkeys_contained_in(best_path->path.pathkeys, best_path->subpath->pathkeys))
+			subplan = (Plan *) make_sort_from_groupcols(root,
+														best_path->group_clause,
+														groupColIdx,
+														subplan);
+
+		plan = (Plan *) make_agg(root,
+								 build_path_tlist(root, &best_path->path),
+								 NIL,
+								 AGG_SORTED,
+								 &agg_costs,
+								 numGroupCols,
+								 groupColIdx,
+								 extract_grouping_ops(best_path->group_clause),
+								 NIL,
+								 numGroups,
+								 subplan);
+	}
+
+	/* Adjust output size estimate (other fields should be OK already) */
+	plan->plan_rows = best_path->path.rows;
+
+	return plan;
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 848df97..75fc81e 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -20,11 +20,14 @@
  */
 #include "postgres.h"
 
+#include "nodes/nodeFuncs.h"
 #include "optimizer/orclauses.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/planmain.h"
+#include "optimizer/tlist.h"
+#include "optimizer/var.h"
 
 
 /*
@@ -106,6 +109,22 @@ query_planner(PlannerInfo *root, List *tlist,
 	root->placeholder_list = NIL;
 	root->initial_rels = NIL;
 
+	/* XXX for now, only enable this when there are no aggregates as we
+	 * don't yet have combine support for aggregates
+	 */
+	if (parse->hasAggs == false &&
+		parse->groupClause != NIL &&
+		!expression_returns_set((Node *) parse->targetList))
+	{
+		List *groupExprs;
+
+		root->tryGroupBeforeJoin = true;
+
+		groupExprs = get_sortgrouplist_exprs(parse->groupClause, parse->targetList);
+		root->minimumGroupingRels = pull_varnos((Node *) groupExprs);
+	}
+
+
 	/*
 	 * Make a flattened version of the rangetable for faster access (this is
 	 * OK because the rangetable won't change any more), and set up an empty
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 935bc2b..4d95652 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -24,6 +24,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/planmain.h"
 #include "optimizer/restrictinfo.h"
+#include "optimizer/tlist.h"
 #include "optimizer/var.h"
 #include "parser/parsetree.h"
 #include "utils/lsyscache.h"
@@ -1068,6 +1069,149 @@ create_result_path(List *quals)
 }
 
 /*
+ * create_grouping_path
+ *		Creates a path to perform grouping using either hashed, sorted
+ *		or plain.
+ *
+ * 'combine_states' instructs the executor that this path should use the
+ * aggregate state combination functions to merge aggregate transition states
+ * rather than the transition function.
+ *
+ * 'finalize_aggs' instructs the executor to call the final function on the
+ * aggregated transition states, when false the final function is not called.
+ */
+GroupingPath *
+create_grouping_path(PlannerInfo *root, RelOptInfo *rel,
+					 AggStrategy aggstrategy, List *group_clause,
+					 bool combine_states, bool finalize_aggs)
+{
+	GroupingPath   *pathnode = makeNode(GroupingPath);
+	List		   *groupExprs;
+	double			dNumGroups;
+	Query		   *parse = root->parse;
+	double			path_rows;
+	int				path_width;
+	AggClauseCosts agg_costs;
+
+	MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+
+	path_rows = rel->rows;
+	path_width = rel->width;
+
+	pathnode->path.pathtype = T_Grouping;
+	pathnode->path.parent = rel;
+	pathnode->path.param_info = NULL;
+
+	pathnode->aggstrategy = aggstrategy;
+	pathnode->group_clause = group_clause;
+	pathnode->combine_states = combine_states;
+	pathnode->finalize_aggs = finalize_aggs;
+
+	/*
+	 * estimate the number of groups, this will become 1 if there's no grouping
+	 * columns
+	 */
+	pathnode->group_exprs = groupExprs =
+			get_sortgrouplist_exprs(parse->groupClause, parse->targetList);
+
+	dNumGroups = estimate_num_groups(root, groupExprs, path_rows, NULL);
+
+	if (parse->hasAggs)
+	{
+		/*
+		 * Collect statistics about aggregates for estimating costs. Note:
+		 * we do not attempt to detect duplicate aggregates here; a
+		 * somewhat-overestimated cost is okay for our present purposes.
+		 */
+		count_agg_clauses(root, (Node *)parse->targetList, &agg_costs);
+	}
+
+	if (aggstrategy == AGG_SORTED)
+	{
+		Path	   *sorted_path;
+		Path	   *cheapest_path;
+
+		/*
+		 * if the group by clause is empty or cannot be sorted then we'll
+		 * return NULL and rely on a hashing path being created.
+		 */
+		if (group_clause == NIL || !grouping_is_sortable(group_clause))
+			return NULL;
+
+		cheapest_path = rel->cheapest_total_path;
+
+		pathnode->path.pathkeys = make_pathkeys_for_sortclauses(root, group_clause,
+								parse->targetList);
+
+		/* can we find a pre-sorted path for the required sort order? */
+		sorted_path = get_cheapest_path_for_pathkeys(rel->pathlist,
+			pathnode->path.pathkeys, NULL, TOTAL_COST);
+
+		/*
+		 * if no pre-sorted path exists, we'll need to perform an explicit sort
+		 * on the cheapest_path.
+		 */
+		if (sorted_path == NULL)
+		{
+			/* Figure cost for sorting */
+			cost_sort(&pathnode->path, root, pathnode->path.pathkeys,
+				cheapest_path->total_cost, path_rows, path_width, 0.0,
+				work_mem, -1);
+
+			cost_agg(&pathnode->path, root, AGG_SORTED, &agg_costs,
+				list_length(group_clause), dNumGroups,
+				pathnode->path.startup_cost, pathnode->path.total_cost,
+				path_rows);
+
+			pathnode->subpath = cheapest_path;
+		}
+		else
+		{
+			cost_agg(&pathnode->path, root, AGG_SORTED, &agg_costs,
+				list_length(group_clause), dNumGroups,
+				pathnode->path.startup_cost, pathnode->path.total_cost,
+				path_rows);
+
+			pathnode->subpath = sorted_path;
+		}
+
+		return pathnode;
+	}
+	else /* AGG_HASHED & AGG_PLAIN */
+	{
+		Path	   *cheapest_path;
+
+		if (agg_costs.numOrderedAggs > 0)
+			return NULL;
+
+		if (aggstrategy == AGG_HASHED)
+		{
+			if (!grouping_is_hashable(group_clause))
+				return NULL;
+
+			if (!enable_hashagg)
+				pathnode->path.startup_cost += disable_cost;
+		}
+
+		/* hashing requires no order, so we always use the cheapest path */
+		cheapest_path = rel->cheapest_total_path;
+
+		pathnode->path.pathkeys = NIL;
+		pathnode->path.startup_cost += cheapest_path->startup_cost;
+		pathnode->path.total_cost += cheapest_path->total_cost;
+
+		cost_agg(&pathnode->path, root, aggstrategy, &agg_costs,
+			list_length(group_clause), dNumGroups,
+			pathnode->path.startup_cost, pathnode->path.total_cost,
+			path_rows);
+
+		pathnode->subpath = cheapest_path;
+
+		return pathnode;
+	}
+}
+
+/*
  * create_material_path
  *	  Creates a path corresponding to a Material plan, returning the
  *	  pathnode.
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1b7b914..9ff9e5c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -873,6 +873,15 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
+		{ "enable_earlygrouping", PGC_USERSET, QUERY_TUNING_METHOD,
+		gettext_noop("Enables the planner's ability to perform GROUP BY before joining to unrelated tables."),
+		NULL
+		},
+		&enable_earlygrouping,
+		true,
+		NULL, NULL, NULL
+	},
+	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
 			gettext_noop("Enables genetic query optimization."),
 			gettext_noop("This algorithm attempts to do planning without "
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 748e434..83d4ef9 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -73,6 +73,7 @@ typedef enum NodeTag
 	T_Group,
 	T_Agg,
 	T_WindowAgg,
+	T_Grouping,
 	T_Unique,
 	T_Hash,
 	T_SetOp,
@@ -232,6 +233,7 @@ typedef enum NodeTag
 	T_HashPath,
 	T_TidPath,
 	T_ForeignPath,
+	T_GroupingPath,
 	T_CustomPath,
 	T_AppendPath,
 	T_MergeAppendPath,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index cb916ea..a7836eb 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -16,6 +16,7 @@
 
 #include "access/sdir.h"
 #include "lib/stringinfo.h"
+#include "nodes/plannodes.h"
 #include "nodes/params.h"
 #include "nodes/parsenodes.h"
 #include "storage/block.h"
@@ -249,7 +250,10 @@ typedef struct PlannerInfo
 	bool		hasPseudoConstantQuals; /* true if any RestrictInfo has
 										 * pseudoconstant = true */
 	bool		hasRecursion;	/* true if planning a recursive WITH item */
-
+	bool		tryGroupBeforeJoin;	/* true if we should try to perform
+									 * grouping before the final level */
+	Relids		minimumGroupingRels; /* minimum set of rels which must be
+									  * joined before grouping can happen */
 	/* These fields are used only when hasRecursion is true: */
 	int			wt_param_id;	/* PARAM_EXEC ID for the work table */
 	struct Plan *non_recursive_plan;	/* plan for non-recursive term */
@@ -447,6 +451,8 @@ typedef struct RelOptInfo
 	struct Path *cheapest_startup_path;
 	struct Path *cheapest_total_path;
 	struct Path *cheapest_unique_path;
+	struct GroupingPath *cheapest_sorted_group_path;
+	struct GroupingPath *cheapest_group_path;
 	List	   *cheapest_parameterized_paths;
 
 	/* information about a base rel (not set for join rels!) */
@@ -820,6 +826,41 @@ typedef struct IndexPath
 	Selectivity indexselectivity;
 } IndexPath;
 
+/*----------
+ * GroupingPath represents grouping of rows into groups from the output of its
+ * subpath.
+ *
+ * This is unlike most other Path nodes in that it can actually generate
+ * different plans: either hash-based or sort-based implementation, or a
+ * single group if the 'group_clause' is NIL.
+ *
+ * 'subpath' path to perform grouping on.
+ *
+ * 'aggstrategy' method of grouping, hashed, sorted or plain.
+ *
+ * 'group_clause' GROUP BY clause as a list of SortGroupClause items.
+ *
+ * 'group_exprs' GROUP BY clause as a list of Expr items.
+ *
+ * 'combine_states' true if subpath is supplying already partially,
+ * non-finalized groups.
+ *
+ * 'finalize_aggs' true if final aggregate values are required, this should
+ * always be true for the highest level GroupingPath in the plan tree, and
+ * always false when there's GroupingPaths higher up in the plan tree.
+ *----------
+ */
+typedef struct GroupingPath
+{
+	Path		path;
+	Path	   *subpath;			/* Access path which this path must use */
+	AggStrategy	aggstrategy;		/* Plain, Hashed, Sorted */
+	List	   *group_clause;		/* GROUP BY clause as SortGroupClauses */
+	List	   *group_exprs;		/* GROUP BY clause as Exprs */
+	bool		combine_states;		/* Input are states instead of values */
+	bool		finalize_aggs;	/* call the finalfn on aggregates? */
+} GroupingPath;
+
 /*
  * BitmapHeapPath represents one or more indexscans that generate TID bitmaps
  * instead of directly accessing the heap, followed by AND/OR combinations
@@ -1008,7 +1049,7 @@ typedef struct MaterialPath
  * UniquePath represents elimination of distinct rows from the output of
  * its subpath.
  *
- * This is unlike the other Path nodes in that it can actually generate
+ * This is unlike most other Path nodes in that it can actually generate
  * different plans: either hash-based or sort-based implementation, or a
  * no-op if the input path can be proven distinct already.  The decision
  * is sufficiently localized that it's not worth having separate Path node
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index dd43e45..aaa1e08 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -61,6 +61,7 @@ extern bool enable_nestloop;
 extern bool enable_material;
 extern bool enable_mergejoin;
 extern bool enable_hashjoin;
+extern bool enable_earlygrouping;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 161644c..ac75208 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -66,6 +66,9 @@ extern MergeAppendPath *create_merge_append_path(PlannerInfo *root,
 						 List *pathkeys,
 						 Relids required_outer);
 extern ResultPath *create_result_path(List *quals);
+extern GroupingPath *create_grouping_path(PlannerInfo *root, RelOptInfo *rel,
+						 AggStrategy aggstrategy, List *group_clause,
+						 bool combine_states, bool finalize_aggs);
 extern MaterialPath *create_material_path(RelOptInfo *rel, Path *subpath);
 extern UniquePath *create_unique_path(PlannerInfo *root, RelOptInfo *rel,
 				   Path *subpath, SpecialJoinInfo *sjinfo);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 3e2378a..1983bef 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -47,6 +47,7 @@ extern PGDLLIMPORT join_search_hook_type join_search_hook;
 
 
 extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
+extern void build_grouping_paths(PlannerInfo *root, RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 10 years ago

In reply to: David Rowley (#1)

Re: GROUP BY before JOIN

This looks like one example of general problem of finding optimal order for
SQL operations. Consider a query defined as sql_op1(sql_op2(sql_op3(A, B),
sql_op4(C, D), sql_op5(E, F)))) where sql_op can be SQL operation like
join, grouping, aggregation, projection, union, intersection, limit,
ordering etc. and A, B, C, D are tables. Given such a representation, how
do we find an optimal order of operation?

In you example the query which is defined like GROUPBY(JOIN(A, B)) might
have an optimal order JOIN(GROUPBY(A), GROUPBY(B)) if we can prove GROUPBY
is distributive over JOIN. Similarly GROUPBY(UNION(A, B)) might have an
optimal order UNION(GROUPBY(A), GROUPBY(B)) if A and B produce disjoint
groups i.e. UNION is distributive over GROUPBY.

Right now, we use dynamic programming to find the optimal order for join
tree (make_one_rel). In the language above, we find the optimal order for
evaluating an expression consisting of JOIN operators by exploiting their
commutative and associative properties. If we could extend that to SQL
expression tree representing the query, we will be able to solve many of
such re-ordering problems using current path infrastructure. The dynamic
program for this would look something like below

root = top operator of the query
child = the operands of the root

cost of query = minimum of
1. cost of query without any mutation
2. cost of root and its child operator commuted, if there is only one child
for root operator and root operator and child operator are commutable
3. cost of root distributed over children operands if there are multiple
operands and root operator is distributive over child operators
4. ...
5. ...

Each operator root will have a RelOptInfo to which we add many paths
possible for performing that operation using different orders of
evaluation. Current add_path should eliminate the expensive ones.

This is going to explode the number of path (plan) trees considered by
planner, so we need to be judicious when to apply this technique or have a
rigorous pruning technique to discard costly paths early enough. This
approach might be better than trying to solve each re-ordering problem
separately and might go well with upper planner pathification being
discussed at
/messages/by-id/CA+TgmoaqgzhUX7frzddpGhkqT3rcNx-f0dF+ZqzsEfW-ARXAww@mail.gmail.com
.

On Tue, Aug 4, 2015 at 1:57 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

== Overview ==

As of today we always perform GROUP BY at the final stage, after each
relation in the query has been joined. This of course works, but it's not
always the most optimal way of executing the query.

Consider the following two relations:

create table product (product_id int primary key, code varchar(32) not
null);
create table sale (sale_id int primary key, product_id int not null, qty
int not null);
create index on sale (product_id);

Populated with:

insert into product select x.x,'ABC' || x.x from generate_series(1,100)
x(x);
insert into sale select x.x,x.x%100+1, 10 from generate_series(1,1000000)
x(x);

Here we have a table with 100 products and another with 1 million sales
records.

If we wanted to see the total sales for each product we may write a query
such as:

select s.product_id,sum(qty) as qty from sale s inner join product p on
s.product_id = p.product_id group by s.product_id;

If we look at the explain for this query we can see that the grouping
happened after the join took place:

QUERY PLAN
--------------------------------------------------
HashAggregate
Group Key: s.product_id
-> Hash Join
Hash Cond: (s.product_id = p.product_id)
-> Seq Scan on sale s
-> Hash
-> Seq Scan on product p

The join here would product exactly 1 million rows, then hashaggregate
will group 1 million rows.

If it were possible for PostgreSQL to perform the GROUP BY before the join
we could change that to Grouping 1 million rows, then joining 100 rows.

Of course we could have written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from sale
group by product_id) s inner join product p on s.product_id = p.product_id;

Which would do exactly that, and in this particular case it is much
faster, but it's not all rainbows and unicorns, as if the join happened to
filter out many of the groups, then it might not be a win at all.

Consider:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id where p.code = 'ABC1' group by s.product_id;

Since the product.code has no equivalence member in the sale table the
predicate cannot be applied to the sale table, so in this case if we'd
written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from sale
group by product_id) s inner join product p on s.product_id = p.product_id
where p.code = 'ABC1';

We'd have thrown away 99% of the groups created in the subquery.

== Proposal ==

I've been working on allowing the planner to properly cost which option is
better and choose to perform the GROUP BY at estimated most efficient join
level.

So far I've not gotten as far as supporting queries which contain
aggregate functions, as I need the ability to combine aggregate states
(Simon and I proposed a patch for that here
https://commitfest.postgresql.org/5/131/).

Performance of the above test case is looking quite good so far:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id group by s.product_id;

-- Master = 312.294 ms
-- Patched = 95.328 ms

So a 327% performance increase in this particular case.

== Implementation ==

I've had to make some choices about how exactly to implement this feature.
As I explain above, we can't just always do the grouping at the first
possible opportunity due to the possibility of joins eliminating groups and
needless groups being created.

To support this in the planner I've invented a GroupingPath type and I've
also added 'cheapest_sorted_group_path' and 'cheapest_group_path' to
RelOptInfo. This part I wasn't too sure about and I wondered if these
GroupingPaths should just be added to the normal list with add_path(), but
since these paths perform more work than normal paths that would give them
an unfair disadvantage and they could be thrown out. These paths are only
possibly cheaper after subsequent JOIN has taken place.

There's also an issue with row estimates being wrong on joinrels, and the
row estimate is using the base rel's row count rather than the number of
groups. As yet I'm unsure the best way to fix this.

I wanted to post this early so as to register the fact that I'm working on
this, but I'm also posting in hope to get some early feedback on what I
have so far.

Of course there's lots more work to do here, aggregates need to be
supported, functional dependencies and transitive closures will also need
to be detected in a more complete implementation.

Here's what I have so far (attached)

Comments are most welcome. Thanks.

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

David Rowley

david.rowley@2ndquadrant.com

over 10 years ago

In reply to: Ashutosh Bapat (#2)

Re: GROUP BY before JOIN

On 4 August 2015 at 21:56, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
wrote:

This looks like one example of general problem of finding optimal order
for SQL operations. Consider a query defined as sql_op1(sql_op2(sql_op3(A,
B), sql_op4(C, D), sql_op5(E, F)))) where sql_op can be SQL operation like
join, grouping, aggregation, projection, union, intersection, limit,
ordering etc. and A, B, C, D are tables. Given such a representation, how
do we find an optimal order of operation?

In you example the query which is defined like GROUPBY(JOIN(A, B)) might
have an optimal order JOIN(GROUPBY(A), GROUPBY(B)) if we can prove GROUPBY
is distributive over JOIN. Similarly GROUPBY(UNION(A, B)) might have an
optimal order UNION(GROUPBY(A), GROUPBY(B)) if A and B produce disjoint
groups i.e. UNION is distributive over GROUPBY.

I'm not really sure that I understand why it's GROUPBY(A) and GROUPBY(B).
If the group by contained columns from relations A and B, then it wouldn't
be possible to group each one individually. We would have to wait until the
join was performed in that case.

Right now, we use dynamic programming to find the optimal order for join

tree (make_one_rel). In the language above, we find the optimal order for
evaluating an expression consisting of JOIN operators by exploiting their
commutative and associative properties. If we could extend that to SQL
expression tree representing the query, we will be able to solve many of
such re-ordering problems using current path infrastructure. The dynamic
program for this would look something like below

root = top operator of the query
child = the operands of the root

cost of query = minimum of
1. cost of query without any mutation
2. cost of root and its child operator commuted, if there is only one
child for root operator and root operator and child operator are commutable
3. cost of root distributed over children operands if there are multiple
operands and root operator is distributive over child operators
4. ...
5. ...

Each operator root will have a RelOptInfo to which we add many paths
possible for performing that operation using different orders of
evaluation. Current add_path should eliminate the expensive ones.

So in a query such as:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id group by s.product_id;

Do you mean that there would be 2 RelOptInfos for the sale relation, one
without the GROUP BY, and one with? That would likely solve the issue I
have with join row estimates, but which of the 2 RelOptInfo would all of
the Vars in the parse tree be pointing to?

This is going to explode the number of path (plan) trees considered by
planner, so we need to be judicious when to apply this technique or have a
rigorous pruning technique to discard costly paths early enough. This
approach might be better than trying to solve each re-ordering problem
separately and might go well with upper planner pathification being
discussed at
/messages/by-id/CA+TgmoaqgzhUX7frzddpGhkqT3rcNx-f0dF+ZqzsEfW-ARXAww@mail.gmail.com
.

It's certainly going to increase that, but I'm not sure if 'explode' is the
best word as we at least require all of the rels seen in the GROUP BY
expressions and aggregate function parameters to be joined before we can
consider a GroupingPath. The patch
uses bms_is_subset(root->minimumGroupingRels, rel->relids) to determine
that.

On Tue, Aug 4, 2015 at 1:57 PM, David Rowley <david.rowley@2ndquadrant.com

wrote:

== Overview ==

As of today we always perform GROUP BY at the final stage, after each
relation in the query has been joined. This of course works, but it's not
always the most optimal way of executing the query.

Consider the following two relations:

create table product (product_id int primary key, code varchar(32) not
null);
create table sale (sale_id int primary key, product_id int not null, qty
int not null);
create index on sale (product_id);

Populated with:

insert into product select x.x,'ABC' || x.x from generate_series(1,100)
x(x);
insert into sale select x.x,x.x%100+1, 10 from generate_series(1,1000000)
x(x);

Here we have a table with 100 products and another with 1 million sales
records.

If we wanted to see the total sales for each product we may write a query
such as:

select s.product_id,sum(qty) as qty from sale s inner join product p on
s.product_id = p.product_id group by s.product_id;

If we look at the explain for this query we can see that the grouping
happened after the join took place:

QUERY PLAN
--------------------------------------------------
HashAggregate
Group Key: s.product_id
-> Hash Join
Hash Cond: (s.product_id = p.product_id)
-> Seq Scan on sale s
-> Hash
-> Seq Scan on product p

The join here would product exactly 1 million rows, then hashaggregate
will group 1 million rows.

If it were possible for PostgreSQL to perform the GROUP BY before the
join we could change that to Grouping 1 million rows, then joining 100 rows.

Of course we could have written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from sale
group by product_id) s inner join product p on s.product_id = p.product_id;

Which would do exactly that, and in this particular case it is much
faster, but it's not all rainbows and unicorns, as if the join happened to
filter out many of the groups, then it might not be a win at all.

Consider:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id where p.code = 'ABC1' group by s.product_id;

Since the product.code has no equivalence member in the sale table the
predicate cannot be applied to the sale table, so in this case if we'd
written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from sale
group by product_id) s inner join product p on s.product_id = p.product_id
where p.code = 'ABC1';

We'd have thrown away 99% of the groups created in the subquery.

== Proposal ==

I've been working on allowing the planner to properly cost which option
is better and choose to perform the GROUP BY at estimated most efficient
join level.

So far I've not gotten as far as supporting queries which contain
aggregate functions, as I need the ability to combine aggregate states
(Simon and I proposed a patch for that here
https://commitfest.postgresql.org/5/131/).

Performance of the above test case is looking quite good so far:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id group by s.product_id;

-- Master = 312.294 ms
-- Patched = 95.328 ms

So a 327% performance increase in this particular case.

== Implementation ==

I've had to make some choices about how exactly to implement this
feature. As I explain above, we can't just always do the grouping at the
first possible opportunity due to the possibility of joins eliminating
groups and needless groups being created.

To support this in the planner I've invented a GroupingPath type and I've
also added 'cheapest_sorted_group_path' and 'cheapest_group_path' to
RelOptInfo. This part I wasn't too sure about and I wondered if these
GroupingPaths should just be added to the normal list with add_path(), but
since these paths perform more work than normal paths that would give them
an unfair disadvantage and they could be thrown out. These paths are only
possibly cheaper after subsequent JOIN has taken place.

There's also an issue with row estimates being wrong on joinrels, and the
row estimate is using the base rel's row count rather than the number of
groups. As yet I'm unsure the best way to fix this.

I wanted to post this early so as to register the fact that I'm working
on this, but I'm also posting in hope to get some early feedback on what I
have so far.

Of course there's lots more work to do here, aggregates need to be
supported, functional dependencies and transitive closures will also need
to be detected in a more complete implementation.

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 10 years ago

In reply to: David Rowley (#3)

Re: GROUP BY before JOIN

On Tue, Aug 4, 2015 at 4:00 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 4 August 2015 at 21:56, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com

wrote:

This looks like one example of general problem of finding optimal order
for SQL operations. Consider a query defined as sql_op1(sql_op2(sql_op3(A,
B), sql_op4(C, D), sql_op5(E, F)))) where sql_op can be SQL operation like
join, grouping, aggregation, projection, union, intersection, limit,
ordering etc. and A, B, C, D are tables. Given such a representation, how
do we find an optimal order of operation?

In you example the query which is defined like GROUPBY(JOIN(A, B)) might
have an optimal order JOIN(GROUPBY(A), GROUPBY(B)) if we can prove GROUPBY
is distributive over JOIN. Similarly GROUPBY(UNION(A, B)) might have an
optimal order UNION(GROUPBY(A), GROUPBY(B)) if A and B produce disjoint
groups i.e. UNION is distributive over GROUPBY.

I'm not really sure that I understand why it's GROUPBY(A) and GROUPBY(B).
If the group by contained columns from relations A and B, then it wouldn't
be possible to group each one individually. We would have to wait until the
join was performed in that case.

Sorry for not being clear enough. GROUPBY(A) implies GROUPBY on relation A,
using some columns (unspecified but common to both A and B here). GROUPBY
is used as an operator over relations, in classic sense of operator. This
representation does not have any columns, join conditions specified for
simplicity. It specifies SQL operators on relations as operands.

Right now, we use dynamic programming to find the optimal order for join

tree (make_one_rel). In the language above, we find the optimal order for
evaluating an expression consisting of JOIN operators by exploiting their
commutative and associative properties. If we could extend that to SQL
expression tree representing the query, we will be able to solve many of
such re-ordering problems using current path infrastructure. The dynamic
program for this would look something like below

root = top operator of the query
child = the operands of the root

cost of query = minimum of
1. cost of query without any mutation
2. cost of root and its child operator commuted, if there is only one
child for root operator and root operator and child operator are commutable
3. cost of root distributed over children operands if there are multiple
operands and root operator is distributive over child operators
4. ...
5. ...

Each operator root will have a RelOptInfo to which we add many paths
possible for performing that operation using different orders of
evaluation. Current add_path should eliminate the expensive ones.

So in a query such as:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id group by s.product_id;

Do you mean that there would be 2 RelOptInfos for the sale relation, one
without the GROUP BY, and one with? That would likely solve the issue I
have with join row estimates, but which of the 2 RelOptInfo would all of
the Vars in the parse tree be pointing to?

This query would be represented as GROUPBY(JOIN(sales, product)), grouping
happens on s.product_id and JOIN on s.product_id = p.product_id. The
dynamic programming would consider GROUPBY(JOIN(sales, product)) and
JOIN(GROUPBY(sales), GROUPBY(product)) as possible evaluation orders, since
GROUPBY is distributive over JOIN in this case and choose the order with
the least cost.

This is going to explode the number of path (plan) trees considered by
planner, so we need to be judicious when to apply this technique or have a
rigorous pruning technique to discard costly paths early enough. This
approach might be better than trying to solve each re-ordering problem
separately and might go well with upper planner pathification being
discussed at
/messages/by-id/CA+TgmoaqgzhUX7frzddpGhkqT3rcNx-f0dF+ZqzsEfW-ARXAww@mail.gmail.com
.

It's certainly going to increase that, but I'm not sure if 'explode' is
the best word as we at least require all of the rels seen in the GROUP BY
expressions and aggregate function parameters to be joined before we can
consider a GroupingPath. The patch
uses bms_is_subset(root->minimumGroupingRels, rel->relids) to determine
that.

On Tue, Aug 4, 2015 at 1:57 PM, David Rowley <
david.rowley@2ndquadrant.com> wrote:

== Overview ==

As of today we always perform GROUP BY at the final stage, after each
relation in the query has been joined. This of course works, but it's not
always the most optimal way of executing the query.

Consider the following two relations:

create table product (product_id int primary key, code varchar(32) not
null);
create table sale (sale_id int primary key, product_id int not null, qty
int not null);
create index on sale (product_id);

Populated with:

insert into product select x.x,'ABC' || x.x from generate_series(1,100)
x(x);
insert into sale select x.x,x.x%100+1, 10 from
generate_series(1,1000000) x(x);

Here we have a table with 100 products and another with 1 million sales
records.

If we wanted to see the total sales for each product we may write a
query such as:

select s.product_id,sum(qty) as qty from sale s inner join product p on
s.product_id = p.product_id group by s.product_id;

If we look at the explain for this query we can see that the grouping
happened after the join took place:

QUERY PLAN
--------------------------------------------------
HashAggregate
Group Key: s.product_id
-> Hash Join
Hash Cond: (s.product_id = p.product_id)
-> Seq Scan on sale s
-> Hash
-> Seq Scan on product p

The join here would product exactly 1 million rows, then hashaggregate
will group 1 million rows.

If it were possible for PostgreSQL to perform the GROUP BY before the
join we could change that to Grouping 1 million rows, then joining 100 rows.

Of course we could have written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from
sale group by product_id) s inner join product p on s.product_id =
p.product_id;

Which would do exactly that, and in this particular case it is much
faster, but it's not all rainbows and unicorns, as if the join happened to
filter out many of the groups, then it might not be a win at all.

Consider:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id where p.code = 'ABC1' group by s.product_id;

Since the product.code has no equivalence member in the sale table the
predicate cannot be applied to the sale table, so in this case if we'd
written the query as:

select s.product_id,qty from (select product_id,sum(qty) as qty from
sale group by product_id) s inner join product p on s.product_id =
p.product_id where p.code = 'ABC1';

We'd have thrown away 99% of the groups created in the subquery.

== Proposal ==

I've been working on allowing the planner to properly cost which option
is better and choose to perform the GROUP BY at estimated most efficient
join level.

So far I've not gotten as far as supporting queries which contain
aggregate functions, as I need the ability to combine aggregate states
(Simon and I proposed a patch for that here
https://commitfest.postgresql.org/5/131/).

Performance of the above test case is looking quite good so far:

select s.product_id from sale s inner join product p on s.product_id =
p.product_id group by s.product_id;

-- Master = 312.294 ms
-- Patched = 95.328 ms

So a 327% performance increase in this particular case.

== Implementation ==

I've had to make some choices about how exactly to implement this
feature. As I explain above, we can't just always do the grouping at the
first possible opportunity due to the possibility of joins eliminating
groups and needless groups being created.

To support this in the planner I've invented a GroupingPath type and
I've also added 'cheapest_sorted_group_path' and 'cheapest_group_path' to
RelOptInfo. This part I wasn't too sure about and I wondered if these
GroupingPaths should just be added to the normal list with add_path(), but
since these paths perform more work than normal paths that would give them
an unfair disadvantage and they could be thrown out. These paths are only
possibly cheaper after subsequent JOIN has taken place.

There's also an issue with row estimates being wrong on joinrels, and
the row estimate is using the base rel's row count rather than the number
of groups. As yet I'm unsure the best way to fix this.

I wanted to post this early so as to register the fact that I'm working
on this, but I'm also posting in hope to get some early feedback on what I
have so far.

Of course there's lots more work to do here, aggregates need to be
supported, functional dependencies and transitive closures will also need
to be detected in a more complete implementation.

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company