MIN/MAX optimization for partitioned table

Started by Alan Liover 16 years ago9 messages

ali@truviso.com

over 16 years ago

1 attachment(s)

Consider the following schema:

create table foo_archive (a int, b timestamp);
create index foo_archive_idx on foo_archive(b);
CREATE TABLE foo_archive_2007_01_01 (CONSTRAINT
foo_archive_2007_01_01_b_check CHECK (((b >= '2007-01-01'::date) AND (b <
'2007-01-02'::date)))) INHERITS (foo_archive);
CREATE INDEX foo_archive_2007_01_01_idx ON foo_archive_2007_01_01 USING
btree (b);
CREATE TABLE foo_archive_2007_01_02 (CONSTRAINT
foo_archive_2007_01_02_b_check CHECK (((b >= '2007-01-02'::date) AND (b <
'2007-01-03'::date)))) INHERITS (foo_archive);
CREATE INDEX foo_archive_2007_01_02_idx ON foo_archive_2007_01_02 USING
btree (b);
...

Currently the optimizer yields the following plan:

postgres=# explain select max(b) from foo_archive;
QUERY
PLAN

-----------------------------------------------------------------------------------------------------
Aggregate (cost=18602.00..18602.01 rows=1 width=8)
-> Append (cost=0.00..16005.00 rows=1038800 width=8)
-> Seq Scan on foo_archive (cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_01 foo_archive
(cost=0.00..1331.99 rows=86399 width=8)
-> Seq Scan on foo_archive_2007_01_02 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_03 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_04 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_05 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_06 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_07 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_08 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_09 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_10 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_11 foo_archive
(cost=0.00..1332.00 rows=86400 width=8)
-> Seq Scan on foo_archive_2007_01_12 foo_archive
(cost=0.00..765.01 rows=49601 width=8)
-> Seq Scan on foo_archive_2007_01_13 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_14 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_15 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_16 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_17 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_18 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_19 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_20 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_21 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_22 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_23 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_24 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_25 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_26 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_27 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_28 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_29 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_30 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
-> Seq Scan on foo_archive_2007_01_31 foo_archive
(cost=0.00..29.40 rows=1940 width=8)
(34 rows)

As we can see, the optimizer does not take advantage of the indexes on
column b in the children relations.

Attached is a patch that will take advantage of the indexes (when they're
available and if the index path is cheaper) and yield the following plan.

postgres=# explain select max(b) from foo_archive;
QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.54..1.55 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_idx on foo_archive
(cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 2 (returns $1)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_01_idx on
foo_archive_2007_01_01 foo_archive (cost=0.00..2719.24 rows=86399 width=8)
Filter: (b IS NOT NULL)
InitPlan 3 (returns $2)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_02_idx on
foo_archive_2007_01_02 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 4 (returns $3)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_03_idx on
foo_archive_2007_01_03 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 5 (returns $4)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_04_idx on
foo_archive_2007_01_04 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 6 (returns $5)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_05_idx on
foo_archive_2007_01_05 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 7 (returns $6)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_06_idx on
foo_archive_2007_01_06 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 8 (returns $7)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_07_idx on
foo_archive_2007_01_07 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 9 (returns $8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_08_idx on
foo_archive_2007_01_08 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 10 (returns $9)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_09_idx on
foo_archive_2007_01_09 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 11 (returns $10)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_10_idx on
foo_archive_2007_01_10 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 12 (returns $11)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_11_idx on
foo_archive_2007_01_11 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
Filter: (b IS NOT NULL)
InitPlan 13 (returns $12)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_12_idx on
foo_archive_2007_01_12 foo_archive (cost=0.00..1568.27 rows=49601 width=8)
Filter: (b IS NOT NULL)
InitPlan 14 (returns $13)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_13_idx on
foo_archive_2007_01_13 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 15 (returns $14)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_14_idx on
foo_archive_2007_01_14 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 16 (returns $15)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_15_idx on
foo_archive_2007_01_15 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 17 (returns $16)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_16_idx on
foo_archive_2007_01_16 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 18 (returns $17)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_17_idx on
foo_archive_2007_01_17 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 19 (returns $18)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_18_idx on
foo_archive_2007_01_18 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 20 (returns $19)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_19_idx on
foo_archive_2007_01_19 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 21 (returns $20)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_20_idx on
foo_archive_2007_01_20 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 22 (returns $21)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_21_idx on
foo_archive_2007_01_21 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 23 (returns $22)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_22_idx on
foo_archive_2007_01_22 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 24 (returns $23)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_23_idx on
foo_archive_2007_01_23 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 25 (returns $24)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_24_idx on
foo_archive_2007_01_24 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 26 (returns $25)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_25_idx on
foo_archive_2007_01_25 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 27 (returns $26)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_26_idx on
foo_archive_2007_01_26 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 28 (returns $27)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_27_idx on
foo_archive_2007_01_27 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 29 (returns $28)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_28_idx on
foo_archive_2007_01_28 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 30 (returns $29)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_29_idx on
foo_archive_2007_01_29 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 31 (returns $30)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_30_idx on
foo_archive_2007_01_30 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
InitPlan 32 (returns $31)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_31_idx on
foo_archive_2007_01_31 foo_archive (cost=0.00..73.35 rows=1940 width=8)
Filter: (b IS NOT NULL)
-> Append (cost=0.00..0.32 rows=32 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
(162 rows)

Does this patch make sense for 8.5 to optimize for this particular class of
queries? Specifically queries of the form:

SELECT MAX(col) FROM tab WHERE ...;
SELECT MIN(col) FROM tab WHERE ...;

Thanks, Alan

Attachments:

optimize_append_minmax.patchtext/x-patch; charset=US-ASCII; name=optimize_append_minmax.patchDownload

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 762dfb1..5f88466 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "optimizer/plancat.h"
  #include "optimizer/planmain.h"
  #include "optimizer/predtest.h"
+ #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
  #include "optimizer/tlist.h"
  #include "optimizer/var.h"
*************** create_append_plan(PlannerInfo *root, Ap
*** 546,551 ****
--- 547,553 ----
  	List	   *tlist = build_relation_tlist(best_path->path.parent);
  	List	   *subplans = NIL;
  	ListCell   *subpaths;
+ 	bool	    can_optimize_minmax;
  
  	/*
  	 * It is possible for the subplans list to contain only one entry, or even
*************** create_append_plan(PlannerInfo *root, Ap
*** 566,577 ****
  									NULL);
  	}
  
  	/* Normal case with multiple subpaths */
  	foreach(subpaths, best_path->subpaths)
  	{
  		Path	   *subpath = (Path *) lfirst(subpaths);
  
! 		subplans = lappend(subplans, create_plan(root, subpath));
  	}
  
  	plan = make_append(subplans, false, tlist);
--- 568,618 ----
  									NULL);
  	}
  
+ 	can_optimize_minmax = can_optimize_append_minmax(root);
+ 
  	/* Normal case with multiple subpaths */
  	foreach(subpaths, best_path->subpaths)
  	{
  		Path	   *subpath = (Path *) lfirst(subpaths);
  
! 		RelOptInfo		*parentrel = best_path->path.parent;
! 		RelOptInfo		*childrel = subpath->parent;
! 		AppendRelInfo	*appinfo = NULL;
! 		ListCell		*lc;
! 		Plan			*subplan = NULL;
! 
! 		if (!can_optimize_minmax)
! 		{
! 			subplans = lappend(subplans, create_plan(root, subpath));
! 			continue;
! 		}
! 
! 		/* only try to optimize children rel's */
! 		foreach (lc, root->append_rel_list)
! 		{
! 			AppendRelInfo *a = (AppendRelInfo *) lfirst(lc);
! 			
! 			if (a->child_relid == childrel->relid &&
! 				a->parent_relid == parentrel->relid)
! 			{
! 				appinfo = a;
! 				break;
! 			}
! 		}
! 
! 		if (appinfo)
! 		{
! 			List *targetlist =
! 				(List *) adjust_appendrel_attrs((Node *) root->parse->targetList,
! 												appinfo);
! 			subplan = do_optimize_minmax_aggregates(root, targetlist, childrel,
! 													subpath);
! 		}
! 
! 		if (subplan)
! 			subplans = lappend(subplans, subplan);
! 		else
! 			subplans = lappend(subplans, create_plan(root, subpath));
  	}
  
  	plan = make_append(subplans, false, tlist);
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 34e9617..959cc2c 100644
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** static void make_agg_subplan(PlannerInfo
*** 53,59 ****
  static Node *replace_aggs_with_params_mutator(Node *node, List **context);
  static Oid	fetch_agg_sort_op(Oid aggfnoid);
  
- 
  /*
   * optimize_minmax_aggregates - check for optimizing MIN/MAX via indexes
   *
--- 53,58 ----
*************** optimize_minmax_aggregates(PlannerInfo *
*** 77,89 ****
  	RangeTblRef *rtr;
  	RangeTblEntry *rte;
  	RelOptInfo *rel;
- 	List	   *aggs_list;
- 	ListCell   *l;
- 	Cost		total_cost;
- 	Path		agg_p;
- 	Plan	   *plan;
- 	Node	   *hqual;
- 	QualCost	tlist_cost;
  
  	/* Nothing to do if query has no aggregates */
  	if (!parse->hasAggs)
--- 76,81 ----
*************** optimize_minmax_aggregates(PlannerInfo *
*** 124,129 ****
--- 116,137 ----
  		return NULL;
  	rel = find_base_rel(root, rtr->rtindex);
  
+ 	return do_optimize_minmax_aggregates(root, tlist, rel, best_path);
+ }
+ 
+ Plan *
+ do_optimize_minmax_aggregates(PlannerInfo *root, List *tlist, RelOptInfo *rel,
+ 							  Path *best_path)
+ {
+ 	Query	   *parse = root->parse;
+ 	List	   *aggs_list;
+ 	ListCell   *l;
+ 	Cost		total_cost;
+ 	Path		agg_p;
+ 	Plan	   *plan;
+ 	Node	   *hqual;
+ 	QualCost	tlist_cost;
+ 
  	/*
  	 * Since this optimization is not applicable all that often, we want to
  	 * fall out before doing very much work if possible.  Therefore we do the
*************** optimize_minmax_aggregates(PlannerInfo *
*** 214,219 ****
--- 222,273 ----
  }
  
  /*
+  * can_optimize_append_minmax - check whether optimizing MIN/MAX via indexes
+  * can potentially be pushed down to child relations.
+  *
+  * This checks whether the query is of the form:
+  * 	1. SELECT MAX(col) FROM tab WHERE ...;
+  * 	2. SELECT MIN(col) FROM tab WHERE ...;
+  */
+ bool
+ can_optimize_append_minmax(PlannerInfo *root)
+ {
+ 	Query	   *parse = root->parse;
+ 	List	   *tlist = parse->targetList;
+ 	List	   *aggs_list;
+ 
+ 	/* Nothing to do if query has no aggregates */
+ 	if (!parse->hasAggs)
+ 		return false;
+ 
+ 	Assert(!parse->setOperations);		/* shouldn't get here if a setop */
+ 	Assert(parse->rowMarks == NIL);		/* nor if FOR UPDATE */
+ 
+ 	/*
+ 	 * Reject unoptimizable cases.
+ 	 *
+ 	 * We don't handle GROUP BY or windowing, because our current
+ 	 * implementations of grouping require looking at all the rows anyway, and
+ 	 * so there's not much point in optimizing MIN/MAX.
+ 	 */
+ 	if (parse->groupClause || parse->hasWindowFuncs)
+ 		return false;
+ 
+ 	/* We're only supporting a single MIN/MAX in the targetlist. */
+ 	if (list_length(tlist) > 1 || parse->havingQual)
+ 		return false;
+ 
+ 	aggs_list = NIL;
+ 	if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
+ 		return false;
+ 
+ 	if (list_length(aggs_list) > 1)
+ 		return false;
+ 
+ 	return true;
+ }
+ 
+ /*
   * find_minmax_aggs_walker
   *		Recursively scan the Aggref nodes in an expression tree, and check
   *		that each one is a MIN/MAX aggregate.  If so, build a list of the
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0dd2bcc..939c1ef 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern void query_planner(PlannerInfo *r
*** 34,39 ****
--- 34,42 ----
   */
  extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  						   Path *best_path);
+ extern Plan *do_optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
+ 										   RelOptInfo *rel, Path *best_path);
+ extern bool can_optimize_append_minmax(PlannerInfo *root);
  
  /*
   * prototypes for plan/createplan.c

Greg Stark

gsstark@mit.edu

over 16 years ago

In reply to: Alan Li (#1)

Re: MIN/MAX optimization for partitioned table

Neat! I haven't read the patch yet but I have some questions.

Does this handle the case where some partitions have an index and
others don't? Ie. Does it just apply the regular optimization to each
partition and then slap on the aggregate node? I think that's actually
a realistic case because people often don't have indexes on empty
partitions like the parent partition or a new partition which has just
been added and doesn't have indexes yet.

Is there any overlap with the ordered-append patch which is also in
the pipeline? afaict it covers similar cases but doesn't actually
overlap since the min/max optimization avoids having to do a sort
anywhere.

--
greg
http://mit.edu/~gsstark/resume.pdf

Alan Li

ali@truviso.com

over 16 years ago

In reply to: Greg Stark (#2)

Re: MIN/MAX optimization for partitioned table

On Fri, Jul 17, 2009 at 2:45 PM, Greg Stark <gsstark@mit.edu> wrote:

Neat! I haven't read the patch yet but I have some questions.

Does this handle the case where some partitions have an index and
others don't? Ie. Does it just apply the regular optimization to each
partition and then slap on the aggregate node? I think that's actually
a realistic case because people often don't have indexes on empty
partitions like the parent partition or a new partition which has just
been added and doesn't have indexes yet.

Is there any overlap with the ordered-append patch which is also in
the pipeline? afaict it covers similar cases but doesn't actually
overlap since the min/max optimization avoids having to do a sort
anywhere.

--
greg
http://mit.edu/~gsstark/resume.pdf <http://mit.edu/%7Egsstark/resume.pdf>

Hi Greg,

My colleague, Jeff Davis, just pointed me to the work that you're doing with
MergeAppend. I didn't know about it.

Yes, it does handle the case where no index exists in the child partition.
It defaults to the Seqscan plan for that particular partition because it
still depends on the aggregate node on top of the append node.

I haven't looked at your MergeAppend patch so I don't know how much overlap
there is. Based on my limited understanding of it, I think it may be two
different approaches to optimizing the same problem with yours being a more
general solution that solves a wider set of optimizations for partitioned
tables while I'm trying to solve a very specific problem. You are also
correct that my patch will not have to sort on partitions without the
appropriate index, so the plan it generates should be cheaper.

Any more thoughts about my patch or ways of making the two patches work
together would be greatly appreciated.

Thanks, Alan

Greg Stark

gsstark@mit.edu

over 16 years ago

In reply to: Alan Li (#3)

Re: MIN/MAX optimization for partitioned table

This part:

! /* only try to optimize children rel's */
! foreach (lc, root->append_rel_list)
! {
! AppendRelInfo *a = (AppendRelInfo *) lfirst(lc);
!
! if (a->child_relid == childrel->relid &&
! a->parent_relid == parentrel->relid)
! {
! appinfo = a;
! break;
! }
! }

Looks like O(n^2). I guess that's a bigger problem than with just this
patch. Perhaps append_rel_list should be a dynahash in general. I
never really understood why it was simpler to have a single global
append_rel_list anyways.

The other thing is it would be nice if we could avoid making separate
subplans for each child and instead make one for the whole structure
including the aggregate. It would at the very least make the explain
output prettier, but I think it would avoid repeated execution of the
aggregate in some cases as well.

--
greg
http://mit.edu/~gsstark/resume.pdf

Alan Li

ali@truviso.com

over 16 years ago

In reply to: Greg Stark (#4)

Re: MIN/MAX optimization for partitioned table

On Sat, Jul 18, 2009 at 3:13 AM, Greg Stark <gsstark@mit.edu> wrote:

This part:

! /* only try to optimize children rel's */
! foreach (lc, root->append_rel_list)
! {
! AppendRelInfo *a = (AppendRelInfo *) lfirst(lc);
!
! if (a->child_relid == childrel->relid &&
! a->parent_relid == parentrel->relid)
! {
! appinfo = a;
! break;
! }
! }

Looks like O(n^2). I guess that's a bigger problem than with just this
patch. Perhaps append_rel_list should be a dynahash in general. I
never really understood why it was simpler to have a single global
append_rel_list anyways.

Yeah, are you running into the same issue as well? I tried to figure out a
way around the O(n^2) behavior, but there were no existing direct
relationship between the child subpath and its associated AppendRelInfo. I
think an AppenRelInfo dynahash would work and just have it hash by the
child_relid.

The other thing is it would be nice if we could avoid making separate
subplans for each child and instead make one for the whole structure
including the aggregate. It would at the very least make the explain
output prettier, but I think it would avoid repeated execution of the
aggregate in some cases as well.

How would this plan look? I think the repeated execution of the aggregate
comes from having to process the output of each child. So it's O(n)
executions where n is the number of children, assuming each child has the
appropriate index for this optimization.

The other optimization that I thought was that the optimizer might use the
check constraints (in the range partitioning) on the column that I want to
do a MIN/MAX on. Then it'll only have to execute the subplan in one child
partition and the parent partition. But it was more of a wild idea on my
part, not sure if that's possible.

Show quoted text

--
greg
http://mit.edu/~gsstark/resume.pdf <http://mit.edu/%7Egsstark/resume.pdf>

Greg Stark

gsstark@mit.edu

over 16 years ago

In reply to: Alan Li (#5)

Re: MIN/MAX optimization for partitioned table

On Sat, Jul 18, 2009 at 6:35 PM, Alan Li<ali@truviso.com> wrote:

Yeah, are you running into the same issue as well? I tried to figure out a
way around the O(n^2) behavior, but there were no existing direct
relationship between the child subpath and its associated AppendRelInfo. I
think an AppenRelInfo dynahash would work and just have it hash by the
child_relid.

I don't see any place in my patch where I had to do anything like
this. I do have to loop through all the appendrel elements and skip
over the ones which don't belong to this appendrel which seems weird
to me but it isn't n^2.

Generating a normal Append node you're generating a brand new path for
each child and attaching them to the append path. It looks like you're
allowing the append rel to generate paths and then digging around
looking at those paths. I wonder if that's the right approach.

The other thing is it would be nice if we could avoid making separate
subplans for each child and instead make one for the whole structure
including the aggregate. It would at the very least make the explain
output prettier, but I think it would avoid repeated execution of the
aggregate in some cases as well.

How would this plan look? I think the repeated execution of the aggregate
comes from having to process the output of each child. So it's O(n)
executions where n is the number of children, assuming each child has the
appropriate index for this optimization.

No I just mean instead of generating

InitPlan 1
Limit
Index Scan
InitPlan 2
Limit
Index Scan
Aggregate
Append
Result
Result

I think it would be better to generate this:

InitPlan 1
Aggregate
Append
Limit 1
IndexScan
Limit 1
IndexScan
Result

The reason I think this is better is because if the subquery is
executed multiple times it will just return the one precalculated
result. In your plan it will have all the child minimums precalculated
but will still have to re-execute the aggregate and append node.
That's not terribly expensive but we might as well generate the
simpler more efficient plan.

The other optimization that I thought was that the optimizer might use the
check constraints (in the range partitioning) on the column that I want to
do a MIN/MAX on. Then it'll only have to execute the subplan in one child
partition and the parent partition. But it was more of a wild idea on my
part, not sure if that's possible.

Yes, I think this is the long-term goal. That's the whole "better
representation of partitions" plotline. To do this efficiently the
planner should know what the partition key is and what the
partitioning structure is.

In an ideal world we would be able to collapse the whole structure and
eliminate the append relation entirely. To do that we need some way to
indicate that the parent relation is provably empty. I had in mind to
mark it as read-only and the statistics as authoritative since that
seems more useful than just being able to mark it empty. I think there
are a lot of other interesting things we could do with statistics if
we knew they were authoritative for a read-only table. (The read-only
property we would need here is very strong. There would have to be a
vacuum freeze and moreover we would have to know that all tuples were
successfully frozen.)

--
greg
http://mit.edu/~gsstark/resume.pdf

Alan Li

ali@truviso.com

over 16 years ago

In reply to: Greg Stark (#6)

1 attachment(s)

Re: MIN/MAX optimization for partitioned table

On Sat, Jul 18, 2009 at 12:04 PM, Greg Stark <gsstark@mit.edu> wrote:

On Sat, Jul 18, 2009 at 6:35 PM, Alan Li<ali@truviso.com> wrote:

Yeah, are you running into the same issue as well? I tried to figure out

a

way around the O(n^2) behavior, but there were no existing direct
relationship between the child subpath and its associated AppendRelInfo.

I

think an AppenRelInfo dynahash would work and just have it hash by the
child_relid.

I don't see any place in my patch where I had to do anything like
this. I do have to loop through all the appendrel elements and skip
over the ones which don't belong to this appendrel which seems weird
to me but it isn't n^2.

Generating a normal Append node you're generating a brand new path for
each child and attaching them to the append path. It looks like you're
allowing the append rel to generate paths and then digging around
looking at those paths. I wonder if that's the right approach.

The other thing is it would be nice if we could avoid making separate
subplans for each child and instead make one for the whole structure
including the aggregate. It would at the very least make the explain
output prettier, but I think it would avoid repeated execution of the
aggregate in some cases as well.

How would this plan look? I think the repeated execution of the

aggregate

comes from having to process the output of each child. So it's O(n)
executions where n is the number of children, assuming each child has the
appropriate index for this optimization.

No I just mean instead of generating

InitPlan 1
Limit
Index Scan
InitPlan 2
Limit
Index Scan
Aggregate
Append
Result
Result

I think it would be better to generate this:

InitPlan 1
Aggregate
Append
Limit 1
IndexScan
Limit 1
IndexScan
Result

The reason I think this is better is because if the subquery is
executed multiple times it will just return the one precalculated
result. In your plan it will have all the child minimums precalculated
but will still have to re-execute the aggregate and append node.
That's not terribly expensive but we might as well generate the
simpler more efficient plan.

The other optimization that I thought was that the optimizer might use

the

check constraints (in the range partitioning) on the column that I want

to

do a MIN/MAX on. Then it'll only have to execute the subplan in one

child

partition and the parent partition. But it was more of a wild idea on my
part, not sure if that's possible.

Yes, I think this is the long-term goal. That's the whole "better
representation of partitions" plotline. To do this efficiently the
planner should know what the partition key is and what the
partitioning structure is.

In an ideal world we would be able to collapse the whole structure and
eliminate the append relation entirely. To do that we need some way to
indicate that the parent relation is provably empty. I had in mind to
mark it as read-only and the statistics as authoritative since that
seems more useful than just being able to mark it empty. I think there
are a lot of other interesting things we could do with statistics if
we knew they were authoritative for a read-only table. (The read-only
property we would need here is very strong. There would have to be a
vacuum freeze and moreover we would have to know that all tuples were
successfully frozen.)

--
greg
http://mit.edu/~gsstark/resume.pdf <http://mit.edu/%7Egsstark/resume.pdf>

Attached is an updated patch that removes the O(n^2) behavior and the
awkwardness of optimizing the seqscan path as the plan was about to be
created. Now, the optimization is considered when appendrel is generating
the paths.

I also changed the plan as you had suggested. It now looks like this:

postgres=# explain select max(b) from foo_archive;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1.22..1.23 rows=1 width=8)
-> Append (cost=0.00..1.13 rows=32 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_idx on foo_archive
(cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_01_idx on
foo_archive_2007_01_01 foo_archive (cost=0.00..2723.24 rows=86399 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_02_idx on
foo_archive_2007_01_02 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_03_idx on
foo_archive_2007_01_03 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_04_idx on
foo_archive_2007_01_04 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_05_idx on
foo_archive_2007_01_05 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_06_idx on
foo_archive_2007_01_06 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_07_idx on
foo_archive_2007_01_07 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_08_idx on
foo_archive_2007_01_08 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_09_idx on
foo_archive_2007_01_09 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_10_idx on
foo_archive_2007_01_10 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_11_idx on
foo_archive_2007_01_11 foo_archive (cost=0.00..2719.26 rows=86400 width=8)
-> Limit (cost=0.00..0.03 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_12_idx on
foo_archive_2007_01_12 foo_archive (cost=0.00..1568.27 rows=49601 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_13_idx on
foo_archive_2007_01_13 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_14_idx on
foo_archive_2007_01_14 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_15_idx on
foo_archive_2007_01_15 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_16_idx on
foo_archive_2007_01_16 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_17_idx on
foo_archive_2007_01_17 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_18_idx on
foo_archive_2007_01_18 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_19_idx on
foo_archive_2007_01_19 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_20_idx on
foo_archive_2007_01_20 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_21_idx on
foo_archive_2007_01_21 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_22_idx on
foo_archive_2007_01_22 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_23_idx on
foo_archive_2007_01_23 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_24_idx on
foo_archive_2007_01_24 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_25_idx on
foo_archive_2007_01_25 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_26_idx on
foo_archive_2007_01_26 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_27_idx on
foo_archive_2007_01_27 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_28_idx on
foo_archive_2007_01_28 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_29_idx on
foo_archive_2007_01_29 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_30_idx on
foo_archive_2007_01_30 foo_archive (cost=0.00..73.35 rows=1940 width=8)
-> Limit (cost=0.00..0.04 rows=1 width=8)
-> Index Scan Backward using foo_archive_2007_01_31_idx on
foo_archive_2007_01_31 foo_archive (cost=0.00..73.35 rows=1940 width=8)
(66 rows)

Attachments:

optimize_append_minmax_1.patchtext/x-patch; charset=US-ASCII; name=optimize_append_minmax_1.patchDownload

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b375902..0c839b9 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/plancat.h"
+ #include "optimizer/planmain.h"
  #include "optimizer/planner.h"
  #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 289,294 ****
--- 290,296 ----
  	double	   *parent_attrsizes;
  	int			nattrs;
  	ListCell   *l;
+ 	bool		can_optimize_minmax;
  
  	/*
  	 * Initialize to compute size estimates for whole append relation.
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 309,314 ****
--- 311,319 ----
  	nattrs = rel->max_attr - rel->min_attr + 1;
  	parent_attrsizes = (double *) palloc0(nattrs * sizeof(double));
  
+ 
+ 	can_optimize_minmax = can_optimize_append_minmax(root);
+ 
  	/*
  	 * Generate access paths for each member relation, and pick the cheapest
  	 * path for each one.
*************** set_append_rel_pathlist(PlannerInfo *roo
*** 427,433 ****
--- 432,454 ----
  			subpaths = list_concat(subpaths,
  								   ((AppendPath *) childpath)->subpaths);
  		else
+ 		{
+ 			if (can_optimize_minmax)
+ 			{
+ 				Path *minmax_path;
+ 				List *targetlist = (List *)
+ 					adjust_appendrel_attrs((Node *) root->parse->targetList,
+ 													appinfo);
+ 				
+ 				minmax_path = optimize_append_minmax(root, targetlist, childrel,
+ 													 childpath);
+ 
+ 				if (minmax_path)
+ 					childpath = minmax_path;
+ 			}
+ 
  			subpaths = lappend(subpaths, childpath);
+ 		}
  
  		/*
  		 * Accumulate size information from each child.
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 6e5c251..6643486 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***************
*** 20,25 ****
--- 20,26 ----
  #include <math.h>
  
  #include "access/skey.h"
+ #include "catalog/pg_type.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #include "optimizer/clauses.h"
*************** create_append_plan(PlannerInfo *root, Ap
*** 546,551 ****
--- 547,554 ----
  	List	   *tlist = build_relation_tlist(best_path->path.parent);
  	List	   *subplans = NIL;
  	ListCell   *subpaths;
+ 	bool	    can_optimize_minmax;
+ 	Node	   *limitCount;
  
  	/*
  	 * It is possible for the subplans list to contain only one entry, or even
*************** create_append_plan(PlannerInfo *root, Ap
*** 566,577 ****
  									NULL);
  	}
  
  	/* Normal case with multiple subpaths */
  	foreach(subpaths, best_path->subpaths)
  	{
  		Path	   *subpath = (Path *) lfirst(subpaths);
  
! 		subplans = lappend(subplans, create_plan(root, subpath));
  	}
  
  	plan = make_append(subplans, false, tlist);
--- 569,591 ----
  									NULL);
  	}
  
+ 	can_optimize_minmax = can_optimize_append_minmax(root);
+ 	if (can_optimize_minmax)
+ 		limitCount = (Node *) makeConst(INT8OID, -1, sizeof(int64),
+ 										Int64GetDatum(1), false,
+ 										FLOAT8PASSBYVAL);
+ 
  	/* Normal case with multiple subpaths */
  	foreach(subpaths, best_path->subpaths)
  	{
  		Path	   *subpath = (Path *) lfirst(subpaths);
+ 		Plan	   *subplan = create_plan(root, subpath);
  
! 		/* apply the LIMIT 1 on top of the index scan */
! 		if (can_optimize_minmax && IsA(subplan, IndexScan))
! 			subplan = (Plan *) make_limit(subplan, NULL, limitCount, 0, 1);
! 
! 		subplans = lappend(subplans, subplan);
  	}
  
  	plan = make_append(subplans, false, tlist);
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 34e9617..f7cdc5e 100644
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** optimize_minmax_aggregates(PlannerInfo *
*** 214,219 ****
--- 214,320 ----
  }
  
  /*
+  * can_optimize_append_minmax - check whether optimizing MIN/MAX via indexes
+  * can potentially be pushed down to child relations.
+  *
+  * This checks whether the query is of the form:
+  * 	1. SELECT MAX(col) FROM tab WHERE ...;
+  * 	2. SELECT MIN(col) FROM tab WHERE ...;
+  */
+ bool
+ can_optimize_append_minmax(PlannerInfo *root)
+ {
+ 	Query	   *parse = root->parse;
+ 	List	   *tlist = parse->targetList;
+ 	List	   *aggs_list;
+ 
+ 	/* Nothing to do if query has no aggregates */
+ 	if (!parse->hasAggs)
+ 		return false;
+ 
+ 	Assert(!parse->setOperations);		/* shouldn't get here if a setop */
+ 	Assert(parse->rowMarks == NIL);		/* nor if FOR UPDATE */
+ 
+ 	/*
+ 	 * Reject unoptimizable cases.
+ 	 *
+ 	 * We don't handle GROUP BY or windowing, because our current
+ 	 * implementations of grouping require looking at all the rows anyway, and
+ 	 * so there's not much point in optimizing MIN/MAX.
+ 	 */
+ 	if (parse->groupClause || parse->hasWindowFuncs)
+ 		return false;
+ 
+ 	/* We're only supporting a single MIN/MAX in the targetlist. */
+ 	if (list_length(tlist) > 1 || parse->havingQual)
+ 		return false;
+ 
+ 	aggs_list = NIL;
+ 	if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
+ 		return false;
+ 
+ 	if (list_length(aggs_list) > 1)
+ 		return false;
+ 
+ 	return true;
+ }
+ 
+ /*
+  * optimize_append_minmax - returns the IndexPath for a child relation as an
+  * optimization for MIN/MAX.  See can_optimize_append_minmax() above.  Returns
+  * NULL if no IndexPath is available or it is not profitable compared to the
+  * existing best_path.
+  */
+ Path *
+ optimize_append_minmax(PlannerInfo *root, List *tlist, RelOptInfo *rel,
+ 					   Path *best_path)
+ {
+ 	List	   *aggs_list;
+ 	Cost		total_cost;
+ 	MinMaxAggInfo *info;
+ 	Path		agg_p;
+ 	Path		minmax_agg_p;
+ 
+ 	/* Pass 1: Get the one MIN/MAX from the targetlist */
+ 	aggs_list = NIL;
+ 	if (find_minmax_aggs_walker((Node *) tlist, &aggs_list))
+ 		return NULL;
+ 
+ 	Assert(list_length(aggs_list) == 1);
+ 
+ 	/* Pass 2: see if MIN/MAX is optimizable */
+ 	info = (MinMaxAggInfo *) linitial(aggs_list);
+ 	if (!build_minmax_path(root, rel, info))
+ 		return NULL;
+ 	total_cost = info->pathcost;
+ 
+ 	/*
+ 	 * Make the cost comparison.
+ 	 *
+ 	 * We are comparing the cost of the best_path that has to pass all of its
+ 	 * output as input to the aggregate node against the one output tuple in
+ 	 * the optimized path.
+ 	 *
+ 	 * Note that we don't include evaluation cost of the tlist here; this is
+ 	 * OK since it isn't included in best_path's cost either, and should be
+ 	 * the same in either case.
+ 	 */
+ 	cost_agg(&agg_p, root, AGG_PLAIN, list_length(aggs_list),
+ 			 0, 0,
+ 			 best_path->startup_cost, best_path->total_cost,
+ 			 best_path->parent->rows);
+ 
+ 	/* only one input tuple expected from the minmax index path */
+ 	cost_agg(&minmax_agg_p, root, AGG_PLAIN, list_length(aggs_list),
+ 			 0, 0, 0, total_cost, 1);
+ 
+ 	if (minmax_agg_p.total_cost > agg_p.total_cost)
+ 		return NULL;			/* too expensive */
+ 
+ 	return (Path *) info->path;
+ }
+ 
+ /*
   * find_minmax_aggs_walker
   *		Recursively scan the Aggref nodes in an expression tree, and check
   *		that each one is a MIN/MAX aggregate.  If so, build a list of the
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0dd2bcc..61dd0a5 100644
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern void query_planner(PlannerInfo *r
*** 34,39 ****
--- 34,42 ----
   */
  extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist,
  						   Path *best_path);
+ extern bool can_optimize_append_minmax(PlannerInfo *root);
+ extern Path *optimize_append_minmax(PlannerInfo *root, List *tlist,
+ 									RelOptInfo *rel, Path *best_path);
  
  /*
   * prototypes for plan/createplan.c

Greg Stark

gsstark@mit.edu

over 16 years ago

In reply to: Alan Li (#7)

Re: MIN/MAX optimization for partitioned table

On Mon, Jul 20, 2009 at 7:40 PM, Alan Li<ali@truviso.com> wrote:

Attached is an updated patch that removes the O(n^2) behavior and the
awkwardness of optimizing the seqscan path as the plan was about to be
created. Now, the optimization is considered when appendrel is generating
the paths.

I also changed the plan as you had suggested. It now looks like this:

Hm, that's not quite the plan I described either. I had in mind to
mirror the existing min/max optimization which put the whole thing in
an InitPlan and put a Result node in place of the actual plan. Your
original patch did that for each subplan but I thought it would be
better to do it for the whole aggregate.

However the more I think about it the more I don't understand why Tom
arranged to do that for the min/max optimization anyways. For
subqueries where that makes sense that would surely happen anyways
such as in the first example below. And for joins where it's necessary
the planner knows to put a Materialize node which sounds just as good.

Here's what happens with your current patch in the case I was
concerned about -- note that the planner automatically detects the
case and turns the whole subplan into an initplan anyways:

postgres=# explain select * from y where j = (select min(i) from x) ;
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on y (cost=40.12..80.12 rows=12 width=4)
Filter: (j = $0)
InitPlan 1 (returns $0)
-> Aggregate (cost=40.11..40.12 rows=1 width=4)
-> Append (cost=0.00..34.10 rows=2403 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi on x (cost=0.00..80.25
rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi2 on x2 x
(cost=0.00..80.25 rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi3 on x3 x
(cost=0.00..80.25 rows=2400 width=4)
-> Seq Scan on x1 x (cost=0.00..34.00 rows=2400 width=4)
(12 rows)

And here's another case where you wouldn't want multiple execution --
but the planner here figures out to materialize the result:

postgres=# explain select * from y left outer join (select min(i) as i
from x) as x on (i=j);
QUERY PLAN
--------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=40.13..128.13 rows=2400 width=8)
Join Filter: ((min(public.x.i)) = y.j)
-> Seq Scan on y (cost=0.00..34.00 rows=2400 width=4)
-> Materialize (cost=40.13..40.14 rows=1 width=4)
-> Aggregate (cost=40.11..40.12 rows=1 width=4)
-> Append (cost=0.00..34.10 rows=2403 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi on x
(cost=0.00..80.25 rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi2 on x2 x
(cost=0.00..80.25 rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi3 on x3 x
(cost=0.00..80.25 rows=2400 width=4)
-> Seq Scan on x1 x (cost=0.00..34.00 rows=2400 width=4)
(13 rows)

So I'm a bit puzzled why Tom's min/max optimization bothers with the
whole Initplan/Result business itself anyways.

--
greg
http://mit.edu/~gsstark/resume.pdf

Alan Li

ali@truviso.com

over 16 years ago

In reply to: Greg Stark (#8)

Re: MIN/MAX optimization for partitioned table

On Mon, Jul 20, 2009 at 12:29 PM, Greg Stark <gsstark@mit.edu> wrote:

On Mon, Jul 20, 2009 at 7:40 PM, Alan Li<ali@truviso.com> wrote:

Attached is an updated patch that removes the O(n^2) behavior and the
awkwardness of optimizing the seqscan path as the plan was about to be
created. Now, the optimization is considered when appendrel is

generating

the paths.

I also changed the plan as you had suggested. It now looks like this:

Hm, that's not quite the plan I described either. I had in mind to
mirror the existing min/max optimization which put the whole thing in
an InitPlan and put a Result node in place of the actual plan. Your
original patch did that for each subplan but I thought it would be
better to do it for the whole aggregate.

However the more I think about it the more I don't understand why Tom
arranged to do that for the min/max optimization anyways. For
subqueries where that makes sense that would surely happen anyways
such as in the first example below. And for joins where it's necessary
the planner knows to put a Materialize node which sounds just as good.

Here's what happens with your current patch in the case I was
concerned about -- note that the planner automatically detects the
case and turns the whole subplan into an initplan anyways:

postgres=# explain select * from y where j = (select min(i) from x) ;
QUERY PLAN

----------------------------------------------------------------------------------------------
Seq Scan on y (cost=40.12..80.12 rows=12 width=4)
Filter: (j = $0)
InitPlan 1 (returns $0)
-> Aggregate (cost=40.11..40.12 rows=1 width=4)
-> Append (cost=0.00..34.10 rows=2403 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi on x (cost=0.00..80.25
rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi2 on x2 x
(cost=0.00..80.25 rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi3 on x3 x
(cost=0.00..80.25 rows=2400 width=4)
-> Seq Scan on x1 x (cost=0.00..34.00 rows=2400 width=4)
(12 rows)

And here's another case where you wouldn't want multiple execution --
but the planner here figures out to materialize the result:

postgres=# explain select * from y left outer join (select min(i) as i
from x) as x on (i=j);
QUERY PLAN

--------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=40.13..128.13 rows=2400 width=8)
Join Filter: ((min(public.x.i)) = y.j)
-> Seq Scan on y (cost=0.00..34.00 rows=2400 width=4)
-> Materialize (cost=40.13..40.14 rows=1 width=4)
-> Aggregate (cost=40.11..40.12 rows=1 width=4)
-> Append (cost=0.00..34.10 rows=2403 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi on x
(cost=0.00..80.25 rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi2 on x2 x
(cost=0.00..80.25 rows=2400 width=4)
-> Limit (cost=0.00..0.03 rows=1 width=4)
-> Index Scan using xi3 on x3 x
(cost=0.00..80.25 rows=2400 width=4)
-> Seq Scan on x1 x (cost=0.00..34.00 rows=2400
width=4)
(13 rows)

So I'm a bit puzzled why Tom's min/max optimization bothers with the
whole Initplan/Result business itself anyways.

--
greg
http://mit.edu/~gsstark/resume.pdf <http://mit.edu/%7Egsstark/resume.pdf>

Yes, this is what I saw when I was testing this, and I think that it would
be best to leave the decision of creating an initPlan/materialize to the
optimization of the super-query. That's why I didn't bother to specifically
put an InitPlan on top of the Aggregate in the new patch.

Alan