Partition-wise aggregation/grouping

Started by Jeevan Chalkealmost 9 years ago153 messages

jeevan.chalke@enterprisedb.com

almost 9 years ago

1 attachment(s)

Hi all,

Declarative partitioning is supported in PostgreSQL 10 and work is already
in
progress to support partition-wise joins. Here is a proposal for
partition-wise
aggregation/grouping. Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15%
when
partitions are local.

Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition. Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce
the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.

Attached an experimental patch for the same based on the partition-wise join
patches posted in [1]/messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com.

This patch currently implements partition-wise aggregation when group clause
contains the partitioning key. A query below, involving a partitioned table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7
times
improvement when the partitions were located on the foreign servers.

Here is the sample plan:

postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------
Append (cost=5100.00..61518.90 rows=30 width=12) (actual
time=324.837..944.804 rows=30 loops=1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=324.837..324.838 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p1 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=309.954..309.956 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p2 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=310.002..310.004 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370 ms
Execution time: 945.384 ms
(9 rows)

postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=121518.01..121518.31 rows=30 width=12) (actual
time=6498.452..6498.459 rows=30 loops=1)
Group Key: plt1.a
-> Append (cost=0.00..106518.00 rows=3000001 width=4) (actual
time=0.595..5769.592 rows=3000000 loops=1)
-> Seq Scan on plt1 (cost=0.00..0.00 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=1)
-> Foreign Scan on fplt1_p1 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.587..1844.506 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p2 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.384..1839.633 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p3 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.402..1876.505 rows=1000000 loops=1)
Planning time: 0.251 ms
Execution time: 6499.018 ms
(9 rows)

Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.

In order to support partial aggregation on foreign partitions, we need
support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.

Though is lot of work to be done, I would like to get suggestions/opinions
from
hackers.

I would like to thank Ashutosh Bapat for providing a draft patch and helping
me off-list on this feature while he is busy working on partition-wise join
feature.

[1]: /messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com
/messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku-ijyNR=PbXj1e=9a=qMoQ@mail.gmail.com

Thanks

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

pg_partwise_agg_WIP.patchapplication/x-download; name=pg_partwise_agg_WIP.patchDownload

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6ef1e48..8388ea7 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1064,6 +1064,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(context->foreignrel->reloptkind != RELOPT_UPPER_REL ||
 		   IS_JOIN_REL(scanrel) ||
+		   scanrel->reloptkind == RELOPT_OTHER_MEMBER_REL ||
 		   scanrel->reloptkind == RELOPT_BASEREL);
 
 	/* Construct FROM clause */
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 059c5c3..d968832 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7181,3 +7181,115 @@ AND ftoptions @> array['fetch_size=60000'];
 (1 row)
 
 ROLLBACK;
+-- Partition-wise aggregates with FDW
+CREATE TABLE plt1 (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE plt1_p1 (a int, b int, c text);
+CREATE TABLE plt1_p2 (a int, b int, c text);
+CREATE TABLE plt1_p3 (a int, b int, c text);
+INSERT INTO plt1_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO plt1_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO plt1_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+-- Create foreign partitions
+CREATE FOREIGN TABLE fplt1_p1 PARTITION OF plt1 FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'plt1_p1');
+CREATE FOREIGN TABLE fplt1_p2 PARTITION OF plt1 FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'plt1_p2');;
+CREATE FOREIGN TABLE fplt1_p3 PARTITION OF plt1 FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'plt1_p3');;
+ANALYZE plt1;
+ANALYZE fplt1_p1;
+ANALYZE fplt1_p2;
+ANALYZE fplt1_p3;
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan when partition-wise-agg is disabled
+SET enable_partition_wise_agg TO false;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM plt1 GROUP BY a ORDER BY 1;
+                           QUERY PLAN                            
+-----------------------------------------------------------------
+ Sort
+   Output: plt1.a, (sum(plt1.b)), (min(plt1.b)), (count(*))
+   Sort Key: plt1.a
+   ->  HashAggregate
+         Output: plt1.a, sum(plt1.b), min(plt1.b), count(*)
+         Group Key: plt1.a
+         ->  Append
+               ->  Seq Scan on public.plt1
+                     Output: plt1.a, plt1.b
+               ->  Foreign Scan on public.fplt1_p1
+                     Output: fplt1_p1.a, fplt1_p1.b
+                     Remote SQL: SELECT a, b FROM public.plt1_p1
+               ->  Foreign Scan on public.fplt1_p2
+                     Output: fplt1_p2.a, fplt1_p2.b
+                     Remote SQL: SELECT a, b FROM public.plt1_p2
+               ->  Foreign Scan on public.fplt1_p3
+                     Output: fplt1_p3.a, fplt1_p3.b
+                     Remote SQL: SELECT a, b FROM public.plt1_p3
+(18 rows)
+
+-- Plan when partition-wise-agg is enabled
+SET enable_partition_wise_agg TO true;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM plt1 GROUP BY a ORDER BY 1;
+NOTICE:  partition-wise grouping is possible.
+                                         QUERY PLAN                                          
+---------------------------------------------------------------------------------------------
+ Sort
+   Output: fplt1_p1.a, (sum(fplt1_p1.b)), (min(fplt1_p1.b)), (count(*))
+   Sort Key: fplt1_p1.a
+   ->  Append
+         ->  Foreign Scan
+               Output: fplt1_p1.a, (sum(fplt1_p1.b)), (min(fplt1_p1.b)), (count(*))
+               Relations: Aggregate on (public.fplt1_p1 plt1)
+               Remote SQL: SELECT a, sum(b), min(b), count(*) FROM public.plt1_p1 GROUP BY a
+         ->  Foreign Scan
+               Output: fplt1_p2.a, (sum(fplt1_p2.b)), (min(fplt1_p2.b)), (count(*))
+               Relations: Aggregate on (public.fplt1_p2 plt1)
+               Remote SQL: SELECT a, sum(b), min(b), count(*) FROM public.plt1_p2 GROUP BY a
+         ->  Foreign Scan
+               Output: fplt1_p3.a, (sum(fplt1_p3.b)), (min(fplt1_p3.b)), (count(*))
+               Relations: Aggregate on (public.fplt1_p3 plt1)
+               Remote SQL: SELECT a, sum(b), min(b), count(*) FROM public.plt1_p3 GROUP BY a
+(16 rows)
+
+SELECT a, sum(b), min(b), count(*) FROM plt1 GROUP BY a ORDER BY 1;
+NOTICE:  partition-wise grouping is possible.
+ a  | sum  | min | count 
+----+------+-----+-------
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
+  2 | 2200 |   2 |   100
+  3 | 2300 |   3 |   100
+  4 | 2400 |   4 |   100
+  5 | 2500 |   5 |   100
+  6 | 2600 |   6 |   100
+  7 | 2700 |   7 |   100
+  8 | 2800 |   8 |   100
+  9 | 2900 |   9 |   100
+ 10 | 2000 |   0 |   100
+ 11 | 2100 |   1 |   100
+ 12 | 2200 |   2 |   100
+ 13 | 2300 |   3 |   100
+ 14 | 2400 |   4 |   100
+ 15 | 2500 |   5 |   100
+ 16 | 2600 |   6 |   100
+ 17 | 2700 |   7 |   100
+ 18 | 2800 |   8 |   100
+ 19 | 2900 |   9 |   100
+ 20 | 2000 |   0 |   100
+ 21 | 2100 |   1 |   100
+ 22 | 2200 |   2 |   100
+ 23 | 2300 |   3 |   100
+ 24 | 2400 |   4 |   100
+ 25 | 2500 |   5 |   100
+ 26 | 2600 |   6 |   100
+ 27 | 2700 |   7 |   100
+ 28 | 2800 |   8 |   100
+ 29 | 2900 |   9 |   100
+(30 rows)
+
+-- Clean-up
+DROP FOREIGN TABLE fplt1_p3;
+DROP FOREIGN TABLE fplt1_p2;
+DROP FOREIGN TABLE fplt1_p1;
+DROP TABLE plt1_p3;
+DROP TABLE plt1_p2;
+DROP TABLE plt1_p1;
+DROP TABLE plt1;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 22acba8..630c374 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -349,7 +349,8 @@ static bool postgresRecheckForeignScan(ForeignScanState *node,
 static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 UpperRelationKind stage,
 							 RelOptInfo *input_rel,
-							 RelOptInfo *output_rel);
+							 RelOptInfo *output_rel,
+							 PathTarget *target);
 
 /*
  * Helper functions
@@ -415,7 +416,8 @@ static void add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 								Path *epq_path);
 static void add_foreign_grouping_paths(PlannerInfo *root,
 						   RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel);
+						   RelOptInfo *grouped_rel,
+						   PathTarget *target);
 
 
 /*
@@ -2688,7 +2690,7 @@ estimate_path_cost_size(PlannerInfo *root,
 		else if (foreignrel->reloptkind == RELOPT_UPPER_REL)
 		{
 			PgFdwRelationInfo *ofpinfo;
-			PathTarget *ptarget = root->upper_targets[UPPERREL_GROUP_AGG];
+			PathTarget *ptarget = fpinfo->grouped_target;
 			AggClauseCosts aggcosts;
 			double		input_rows;
 			int			numGroupCols;
@@ -4536,7 +4538,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 	 * different from those in the plan's targetlist. Use a copy of path
 	 * target to record the new sortgrouprefs.
 	 */
-	grouping_target = copy_pathtarget(root->upper_targets[UPPERREL_GROUP_AGG]);
+	grouping_target = copy_pathtarget(fpinfo->grouped_target);
 
 	/*
 	 * Evaluate grouping targets and check whether they are safe to push down
@@ -4715,7 +4717,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
  */
 static void
 postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
-							 RelOptInfo *input_rel, RelOptInfo *output_rel)
+							 RelOptInfo *input_rel, RelOptInfo *output_rel,
+							 PathTarget *target)
 {
 	PgFdwRelationInfo *fpinfo;
 
@@ -4735,7 +4738,7 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 	fpinfo->pushdown_safe = false;
 	output_rel->fdw_private = fpinfo;
 
-	add_foreign_grouping_paths(root, input_rel, output_rel);
+	add_foreign_grouping_paths(root, input_rel, output_rel, target);
 }
 
 /*
@@ -4747,13 +4750,12 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
  */
 static void
 add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel)
+						   RelOptInfo *grouped_rel, PathTarget *target)
 {
 	Query	   *parse = root->parse;
 	PgFdwRelationInfo *ifpinfo = input_rel->fdw_private;
 	PgFdwRelationInfo *fpinfo = grouped_rel->fdw_private;
 	ForeignPath *grouppath;
-	PathTarget *grouping_target;
 	double		rows;
 	int			width;
 	Cost		startup_cost;
@@ -4764,7 +4766,8 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		!root->hasHavingQual)
 		return;
 
-	grouping_target = root->upper_targets[UPPERREL_GROUP_AGG];
+	/* Store passed-in target in fpinfo for later use */
+	fpinfo->grouped_target = target;
 
 	/* save the input_rel as outerrel in fpinfo */
 	fpinfo->outerrel = input_rel;
@@ -4795,7 +4798,7 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* Create and add foreign path to the grouping relation. */
 	grouppath = create_foreignscan_path(root,
 										grouped_rel,
-										grouping_target,
+										target,
 										rows,
 										startup_cost,
 										total_cost,
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 57dbb79..99cecb7 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -95,6 +95,7 @@ typedef struct PgFdwRelationInfo
 
 	/* Grouping information */
 	List	   *grouped_tlist;
+	PathTarget *grouped_target;
 
 	/* Subquery information */
 	bool		make_outerrel_subquery;	/* do we deparse outerrel as a
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 8f3edc1..f02ec8a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1706,3 +1706,47 @@ WHERE ftrelid = 'table30000'::regclass
 AND ftoptions @> array['fetch_size=60000'];
 
 ROLLBACK;
+
+
+-- Partition-wise aggregates with FDW
+CREATE TABLE plt1 (a int, b int, c text) PARTITION BY RANGE(a);
+
+CREATE TABLE plt1_p1 (a int, b int, c text);
+CREATE TABLE plt1_p2 (a int, b int, c text);
+CREATE TABLE plt1_p3 (a int, b int, c text);
+
+INSERT INTO plt1_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO plt1_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO plt1_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+
+-- Create foreign partitions
+CREATE FOREIGN TABLE fplt1_p1 PARTITION OF plt1 FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'plt1_p1');
+CREATE FOREIGN TABLE fplt1_p2 PARTITION OF plt1 FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'plt1_p2');;
+CREATE FOREIGN TABLE fplt1_p3 PARTITION OF plt1 FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'plt1_p3');;
+
+ANALYZE plt1;
+ANALYZE fplt1_p1;
+ANALYZE fplt1_p2;
+ANALYZE fplt1_p3;
+
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan when partition-wise-agg is disabled
+SET enable_partition_wise_agg TO false;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM plt1 GROUP BY a ORDER BY 1;
+
+-- Plan when partition-wise-agg is enabled
+SET enable_partition_wise_agg TO true;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM plt1 GROUP BY a ORDER BY 1;
+SELECT a, sum(b), min(b), count(*) FROM plt1 GROUP BY a ORDER BY 1;
+
+
+-- Clean-up
+DROP FOREIGN TABLE fplt1_p3;
+DROP FOREIGN TABLE fplt1_p2;
+DROP FOREIGN TABLE fplt1_p1;
+DROP TABLE plt1_p3;
+DROP TABLE plt1_p2;
+DROP TABLE plt1_p1;
+DROP TABLE plt1;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 8143f80..0ab5c56 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -130,8 +130,6 @@ static void subquery_push_qual(Query *subquery,
 static void recurse_push_qual(Node *setOp, Query *topquery,
 				  RangeTblEntry *rte, Index rti, Node *qual);
 static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
-static void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
-						List *live_childrels);
 
 
 /*
@@ -1338,7 +1336,7 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  * parameterization or ordering. Similarly it collects partial paths from
  * non-dummy children to create partial append paths.
  */
-static void
+void
 add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 						List *live_childrels)
 {
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 220c81c..b2edaca 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -128,6 +128,7 @@ bool		enable_mergejoin = true;
 bool		enable_hashjoin = true;
 bool		enable_gathermerge = true;
 bool		enable_partition_wise_join = false;
+bool		enable_partition_wise_agg = true;
 
 typedef struct
 {
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 0aae4ca..929e0a6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1610,6 +1610,7 @@ create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
 {
 	Sort	   *plan;
 	Plan	   *subplan;
+	Relids		relids;
 
 	/*
 	 * We don't want any excess columns in the sorted tuples, so request a
@@ -1619,7 +1620,12 @@ create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
 	subplan = create_plan_recurse(root, best_path->subpath,
 								  flags | CP_SMALL_TLIST);
 
-	plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys, NULL);
+	/*
+	 * TODO: we need to fix something here. The "other" upper rels are not
+	 * marked as "OTHER" rels and may not have relids.
+	 */
+	relids = IS_OTHER_REL(best_path->subpath->parent) ? best_path->path.parent->relids : NULL;
+	plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys, relids);
 
 	copy_generic_path_info(&plan->plan, (Path *) best_path);
 
@@ -3393,15 +3399,8 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	/* Copy foreign server OID; likewise, no need to make FDW do this */
 	scan_plan->fs_server = rel->serverid;
 
-	/*
-	 * Likewise, copy the relids that are represented by this foreign scan. An
-	 * upper rel doesn't have relids set, but it covers all the base relations
-	 * participating in the underlying scan, so use root's all_baserels.
-	 */
-	if (rel->reloptkind == RELOPT_UPPER_REL)
-		scan_plan->fs_relids = root->all_baserels;
-	else
-		scan_plan->fs_relids = best_path->path.parent->relids;
+	/* Likewise, copy the relids from Path to Plan */
+	scan_plan->fs_relids = best_path->path.parent->relids;
 
 	/*
 	 * If this is a foreign join, and to make it valid to push down we had to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 17fae4f..9f7db45 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -157,6 +157,15 @@ static PathTarget *make_sort_input_target(PlannerInfo *root,
 					   bool *have_postponed_srfs);
 static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 					  List *targets, List *targets_contain_srfs);
+static void try_partition_wise_grouping(PlannerInfo *root,
+							RelOptInfo *input_rel,
+							RelOptInfo *grouped_rel,
+							PathTarget *target,
+							const AggClauseCosts *agg_costs,
+							List *rollup_lists,
+							List *rollup_groupclauses);
+static bool have_grouping_by_partkey(RelOptInfo *input_rel, PathTarget *target,
+						 List *groupClause);
 
 
 /*****************************************************************************
@@ -2067,7 +2076,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	if (final_rel->fdwroutine &&
 		final_rel->fdwroutine->GetForeignUpperPaths)
 		final_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_FINAL,
-													current_rel, final_rel);
+													current_rel, final_rel,
+													NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
@@ -3283,7 +3293,18 @@ create_grouping_paths(PlannerInfo *root,
 	ListCell   *lc;
 
 	/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
-	grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
+	if (IS_OTHER_REL(input_rel))
+	{
+
+		/*
+		 * TODO: We should mark these rels as "other upper" rels similar to
+		 * "other" join and base relations.
+		 */
+		grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
+									  input_rel->relids);
+	}
+	else
+		grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
 
 	/*
 	 * If the input relation is not parallel-safe, then the grouped relation
@@ -3303,6 +3324,9 @@ create_grouping_paths(PlannerInfo *root,
 	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 	grouped_rel->fdwroutine = input_rel->fdwroutine;
 
+	/* Copy input rels's relids to grouped rel */
+	grouped_rel->relids = input_rel->relids;
+
 	/*
 	 * Check for degenerate grouping.
 	 */
@@ -3377,6 +3401,34 @@ create_grouping_paths(PlannerInfo *root,
 									  rollup_groupclauses);
 
 	/*
+	 * Number of groups estimated above is based on parent relation.  However
+	 * we need to estimate the number of groups for the child.  For that we
+	 * must know the number of partitions.  Find that and devise new estimate
+	 * for number of groups.
+	 *
+	 * FIXME: We might need to do this in get_number_of_groups() itself.  But
+	 * not sure at this time.  Need to revise the logic.
+	 */
+	if (IS_OTHER_REL(input_rel))
+	{
+		RelOptInfo *rel;
+
+		/* Find top-most parent rel */
+		if (IS_JOIN_REL(input_rel))
+			rel = find_join_rel(root, input_rel->top_parent_relids);
+		else
+			rel = find_base_rel(root,
+						bms_singleton_member(input_rel->top_parent_relids));
+
+		/*
+		 * Divide estimated number of groups by number of children to get
+		 * number of groups estimate for child rel.
+		 */
+		if (rel->part_scheme->nparts > 0)
+			dNumGroups = clamp_row_est(dNumGroups / rel->part_scheme->nparts);
+	}
+
+	/*
 	 * Determine whether it's possible to perform sort-based implementations
 	 * of grouping.  (Note that if groupClause is empty,
 	 * grouping_is_sortable() is trivially true, and all the
@@ -3441,6 +3493,11 @@ create_grouping_paths(PlannerInfo *root,
 		/* Insufficient support for partial mode. */
 		try_parallel_aggregation = false;
 	}
+	else if (IS_OTHER_REL(input_rel))
+	{
+		/* TODO: enable parallel query for partition-wise grouping. */
+		try_parallel_aggregation = false;
+	}
 	else
 	{
 		/* Everything looks good. */
@@ -3855,13 +3912,21 @@ create_grouping_paths(PlannerInfo *root,
 				 errdetail("Some of the datatypes only support hashing, while others only support sorting.")));
 
 	/*
+	 * If input relation is partitioned check if we can perform partition-wise
+	 * grouping and/or aggregation.
+	 */
+	try_partition_wise_grouping(root, input_rel, grouped_rel, target,
+								agg_costs, rollup_lists, rollup_groupclauses);
+
+	/*
 	 * If there is an FDW that's responsible for all baserels of the query,
 	 * let it consider adding ForeignPaths.
 	 */
 	if (grouped_rel->fdwroutine &&
 		grouped_rel->fdwroutine->GetForeignUpperPaths)
 		grouped_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_GROUP_AGG,
-													  input_rel, grouped_rel);
+													  input_rel, grouped_rel,
+													  target);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
@@ -3885,6 +3950,191 @@ create_grouping_paths(PlannerInfo *root,
 }
 
 /*
+ * If the input relation is partitioned and the partition keys are leading
+ * group by clauses, each partition produces a different set of groups.
+ * Aggregates within each such group can be computed partition-wise. This
+ * might be optimal because of presence of suitable paths with pathkeys or
+ * because the hash tables for most of the partitions fit in the memory.
+ */
+static void
+try_partition_wise_grouping(PlannerInfo *root,
+							RelOptInfo *input_rel,
+							RelOptInfo *grouped_rel,
+							PathTarget *target,
+							const AggClauseCosts *agg_costs,
+							List *rollup_lists,
+							List *rollup_groupclauses)
+{
+	Query  *query = root->parse;
+	int		nparts;
+	int		cnt_parts;
+	PartitionScheme	part_scheme = input_rel->part_scheme;
+	RelOptInfo **part_rels;
+	List   *live_children = NIL;
+	PathTarget  *scanjoin_target;
+	ListCell *lc;
+
+	/* Nothing to do, if user disabled partition-wise aggregation. */
+	if (!enable_partition_wise_agg)
+		return;
+
+	/* Do not handle grouping sets for now.  */
+	if (rollup_groupclauses || rollup_lists)
+		return;
+
+	/* Nothing to do, if the input relation is not partitioned. */
+	if (!part_scheme)
+		return;
+
+	Assert(input_rel->part_rels);
+
+	/*
+	 * TODO: for now do nothing if partition keys are not leading group by
+	 * clauses. In general we may calculate partial aggregates for each
+	 * partition and combine them.
+	 */
+	if (!have_grouping_by_partkey(input_rel, target, query->groupClause))
+		return;
+
+	/* TODO: should be removed in final version */
+	elog(NOTICE, "partition-wise grouping is possible.");
+
+	nparts = part_scheme->nparts;
+	grouped_rel->part_scheme = input_rel->part_scheme;
+	part_rels = (RelOptInfo **) palloc(nparts * sizeof(RelOptInfo *));
+	grouped_rel->part_rels = part_rels;
+
+	/* Add paths for partition-wise aggregation/grouping. */
+	for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
+	{
+		RelOptInfo *input_child_rel = input_rel->part_rels[cnt_parts];
+		PathTarget *child_target = copy_pathtarget(target);
+		List	   *appinfos = find_appinfos_by_relids(root,
+													   input_child_rel->relids);
+
+		/*
+		 * Now that there can be multiple grouping relations, if we have to
+		 * manage those in the root, we need separate identifiers for those.
+		 * What better identifier than the input relids themselves?
+		 */
+		part_rels[cnt_parts] = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
+											   input_child_rel->relids);
+
+		/* Ignore empty children. They contribute nothing. */
+		if (IS_DUMMY_REL(input_child_rel))
+		{
+			mark_dummy_rel(part_rels[cnt_parts]);
+			continue;
+		}
+		else
+			live_children = lappend(live_children, part_rels[cnt_parts]);
+
+		/*
+		 * Forcibly apply scan/join target to all the Paths for the scan/join
+		 * rel.
+		 *
+		 * In principle we should re-run set_cheapest() here to identify the
+		 * cheapest path, but it seems unlikely that adding the same tlist
+		 * eval costs to all the paths would change that, so we don't bother.
+		 * Instead, just assume that the cheapest-startup and cheapest-total
+		 * paths remain so.  (There should be no parameterized paths anymore,
+		 * so we needn't worry about updating cheapest_parameterized_paths.)
+		 */
+		scanjoin_target = copy_pathtarget(input_rel->cheapest_startup_path->pathtarget);
+		scanjoin_target->exprs = (List *) adjust_appendrel_attrs(root, (Node *) scanjoin_target->exprs,
+																 appinfos);
+
+		foreach(lc, input_child_rel->pathlist)
+		{
+			Path	   *subpath = (Path *) lfirst(lc);
+			Path	   *path;
+
+			Assert(subpath->param_info == NULL);
+			path = apply_projection_to_path(root, input_child_rel,
+											subpath, scanjoin_target);
+			/* If we had to add a Result, path is different from subpath */
+			if (path != subpath)
+			{
+				lfirst(lc) = path;
+				if (subpath == input_child_rel->cheapest_startup_path)
+					input_child_rel->cheapest_startup_path = path;
+				if (subpath == input_child_rel->cheapest_total_path)
+					input_child_rel->cheapest_total_path = path;
+			}
+		}
+
+		/*
+		 * TODO:
+		 * We should somehow make this target available for FDWs, which are
+		 * expected to fetch it directly from root->upper_targets. That array
+		 * can hold only one target for each kind of upper rel. We will now
+		 * have many such upper relations.
+		 */
+		child_target->exprs = (List *) adjust_appendrel_attrs(root,
+														(Node *) target->exprs,
+																	 appinfos);
+
+		create_grouping_paths(root, input_child_rel, child_target, agg_costs,
+							  rollup_lists, rollup_groupclauses);
+
+	}
+
+	/*
+	 * add_paths_to_append_rel() sets the path target from the given relation.
+	 * In this case grouped_rel doesn't have a target set. So temporarily set
+	 * it.
+	 * TODO: probably we should do something better than this.
+	 */
+	grouped_rel->reltarget = target;
+	add_paths_to_append_rel(root, grouped_rel, live_children);
+	grouped_rel->reltarget = NULL;
+}
+
+/*
+ * Returns true if partition keys of the given relation are leading group by
+ * clauses.
+ */
+static bool
+have_grouping_by_partkey(RelOptInfo *input_rel, PathTarget *target,
+						 List *groupClause)
+{
+	PartitionScheme	part_scheme = input_rel->part_scheme;
+	ListCell   *lc;
+	List	   *tlist = make_tlist_from_pathtarget(target);
+	List	   *group_exprs = get_sortgrouplist_exprs(groupClause, tlist);
+	int			cnt_pk = 0;
+	int			num_pks;
+
+	/* Input relation should be partitioned. */
+	Assert(part_scheme);
+
+	num_pks = part_scheme->partnatts;
+
+	foreach(lc, group_exprs)
+	{
+		Expr   *group_expr = lfirst(lc);
+		List   *pk_exprs;
+
+		/* All partition keys are present in the group clause. */
+		if (cnt_pk >= num_pks)
+			return true;
+
+		pk_exprs = input_rel->partexprs[cnt_pk];
+
+		if (!list_member(pk_exprs, group_expr))
+			return false;
+
+		cnt_pk++;
+	}
+
+	/* All partition keys are present in the group clause. */
+	if (cnt_pk >= num_pks)
+		return true;
+
+	return false;
+}
+
+/*
  * create_window_paths
  *
  * Build a new upperrel containing Paths for window-function evaluation.
@@ -3959,7 +4209,8 @@ create_window_paths(PlannerInfo *root,
 	if (window_rel->fdwroutine &&
 		window_rel->fdwroutine->GetForeignUpperPaths)
 		window_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_WINDOW,
-													 input_rel, window_rel);
+													 input_rel, window_rel,
+													 NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
@@ -4263,7 +4514,8 @@ create_distinct_paths(PlannerInfo *root,
 	if (distinct_rel->fdwroutine &&
 		distinct_rel->fdwroutine->GetForeignUpperPaths)
 		distinct_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_DISTINCT,
-													input_rel, distinct_rel);
+													input_rel, distinct_rel,
+													NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
@@ -4405,7 +4657,8 @@ create_ordered_paths(PlannerInfo *root,
 	if (ordered_rel->fdwroutine &&
 		ordered_rel->fdwroutine->GetForeignUpperPaths)
 		ordered_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_ORDERED,
-													  input_rel, ordered_rel);
+													  input_rel, ordered_rel,
+													  NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7f423c9..0cce5e4 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -920,6 +920,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_partition_wise_agg", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables partition-wise aggregation and grouping."),
+			NULL
+		},
+		&enable_partition_wise_agg,
+		true,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 6ca44f7..10fdf6d 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
 typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
 													 UpperRelationKind stage,
 													   RelOptInfo *input_rel,
-													 RelOptInfo *output_rel);
+													 RelOptInfo *output_rel,
+													 PathTarget *target);
 
 typedef void (*AddForeignUpdateTargets_function) (Query *parsetree,
 												   RangeTblEntry *target_rte,
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index e7949d37..7cd8e2c 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -68,6 +68,7 @@ extern bool enable_mergejoin;
 extern bool enable_hashjoin;
 extern bool enable_gathermerge;
 extern bool enable_partition_wise_join;
+extern bool enable_partition_wise_agg;
 extern int	constraint_exclusion;
 
 extern double clamp_row_est(double nrows);
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index f31b70e..77388cc 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -60,6 +60,8 @@ extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
 										Path *bitmapqual);
 extern void generate_partition_wise_join_paths(PlannerInfo *root,
 											   RelOptInfo *rel);
+extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
+						List *live_childrels);
 
 #ifdef OPTIMIZER_DEBUG
 extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
diff --git a/src/test/regress/expected/partition_agg.out b/src/test/regress/expected/partition_agg.out
new file mode 100644
index 0000000..315d396
--- /dev/null
+++ b/src/test/regress/expected/partition_agg.out
@@ -0,0 +1,192 @@
+--
+-- PARTITION_AGG
+-- Test partition-wise aggregation on partitioned tables
+--
+-- Enable partition-wise join, which by default is disabled.
+SET enable_partition_wise_join TO true;
+--
+-- Tests for list partitioned tables.
+--
+CREATE TABLE ptab1 (a int, b int, c text) PARTITION BY LIST(c);
+CREATE TABLE ptab1_p1 PARTITION OF ptab1 FOR VALUES IN ('0000', '0003', '0004', '0010');
+CREATE TABLE ptab1_p2 PARTITION OF ptab1 FOR VALUES IN ('0001', '0005', '0002', '0009');
+CREATE TABLE ptab1_p3 PARTITION OF ptab1 FOR VALUES IN ('0006', '0007', '0008', '0011');
+INSERT INTO ptab1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
+ANALYZE ptab1;
+ANALYZE ptab1_p1;
+ANALYZE ptab1_p2;
+ANALYZE ptab1_p3;
+-- TODO: This table is created only for testing the results. Remove once
+-- results are tested.
+CREATE TABLE uptab1 AS SELECT * FROM ptab1;
+ANALYZE uptab1;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT c, sum(a), avg(b), COUNT(*) FROM ptab1 GROUP BY c ORDER BY 1, 2, 3;
+NOTICE:  partition-wise grouping is possible.
+                                  QUERY PLAN                                  
+------------------------------------------------------------------------------
+ Sort
+   Output: ptab1_p1.c, (sum(ptab1_p1.a)), (avg(ptab1_p1.b)), (count(*))
+   Sort Key: ptab1_p1.c, (sum(ptab1_p1.a)), (avg(ptab1_p1.b))
+   ->  Append
+         ->  HashAggregate
+               Output: ptab1_p1.c, sum(ptab1_p1.a), avg(ptab1_p1.b), count(*)
+               Group Key: ptab1_p1.c
+               ->  Seq Scan on public.ptab1_p1
+                     Output: ptab1_p1.c, ptab1_p1.a, ptab1_p1.b
+         ->  HashAggregate
+               Output: ptab1_p2.c, sum(ptab1_p2.a), avg(ptab1_p2.b), count(*)
+               Group Key: ptab1_p2.c
+               ->  Seq Scan on public.ptab1_p2
+                     Output: ptab1_p2.c, ptab1_p2.a, ptab1_p2.b
+         ->  HashAggregate
+               Output: ptab1_p3.c, sum(ptab1_p3.a), avg(ptab1_p3.b), count(*)
+               Group Key: ptab1_p3.c
+               ->  Seq Scan on public.ptab1_p3
+                     Output: ptab1_p3.c, ptab1_p3.a, ptab1_p3.b
+(19 rows)
+
+SELECT c, sum(a), avg(b), COUNT(*) FROM ptab1 GROUP BY c ORDER BY 1, 2, 3;
+NOTICE:  partition-wise grouping is possible.
+  c   |  sum  |         avg          | count 
+------+-------+----------------------+-------
+ 0000 |   600 |  24.0000000000000000 |    25
+ 0001 |  1850 |  74.0000000000000000 |    25
+ 0002 |  3100 | 124.0000000000000000 |    25
+ 0003 |  4350 | 174.0000000000000000 |    25
+ 0004 |  5600 | 224.0000000000000000 |    25
+ 0005 |  6850 | 274.0000000000000000 |    25
+ 0006 |  8100 | 324.0000000000000000 |    25
+ 0007 |  9350 | 374.0000000000000000 |    25
+ 0008 | 10600 | 424.0000000000000000 |    25
+ 0009 | 11850 | 474.0000000000000000 |    25
+ 0010 | 13100 | 524.0000000000000000 |    25
+ 0011 | 14350 | 574.0000000000000000 |    25
+(12 rows)
+
+SELECT c, sum(a), avg(b), COUNT(*) FROM uptab1 GROUP BY c ORDER BY 1, 2, 3;
+  c   |  sum  |         avg          | count 
+------+-------+----------------------+-------
+ 0000 |   600 |  24.0000000000000000 |    25
+ 0001 |  1850 |  74.0000000000000000 |    25
+ 0002 |  3100 | 124.0000000000000000 |    25
+ 0003 |  4350 | 174.0000000000000000 |    25
+ 0004 |  5600 | 224.0000000000000000 |    25
+ 0005 |  6850 | 274.0000000000000000 |    25
+ 0006 |  8100 | 324.0000000000000000 |    25
+ 0007 |  9350 | 374.0000000000000000 |    25
+ 0008 | 10600 | 424.0000000000000000 |    25
+ 0009 | 11850 | 474.0000000000000000 |    25
+ 0010 | 13100 | 524.0000000000000000 |    25
+ 0011 | 14350 | 574.0000000000000000 |    25
+(12 rows)
+
+-- JOIN query
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.c, sum(t1.a), avg(t1.b), COUNT(*) FROM ptab1 t1, ptab1 t2 WHERE t1.c = t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
+NOTICE:  partition-wise grouping is possible.
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Sort
+   Output: t1.c, (sum(t1.a)), (avg(t1.b)), (count(*))
+   Sort Key: t1.c, (sum(t1.a)), (avg(t1.b))
+   ->  Append
+         ->  HashAggregate
+               Output: t1.c, sum(t1.a), avg(t1.b), count(*)
+               Group Key: t1.c
+               ->  Hash Join
+                     Output: t1.c, t1.a, t1.b
+                     Hash Cond: (t1.c = t2.c)
+                     ->  Seq Scan on public.ptab1_p1 t1
+                           Output: t1.c, t1.a, t1.b
+                     ->  Hash
+                           Output: t2.c
+                           ->  Seq Scan on public.ptab1_p1 t2
+                                 Output: t2.c
+         ->  HashAggregate
+               Output: t1_1.c, sum(t1_1.a), avg(t1_1.b), count(*)
+               Group Key: t1_1.c
+               ->  Hash Join
+                     Output: t1_1.c, t1_1.a, t1_1.b
+                     Hash Cond: (t1_1.c = t2_1.c)
+                     ->  Seq Scan on public.ptab1_p2 t1_1
+                           Output: t1_1.c, t1_1.a, t1_1.b
+                     ->  Hash
+                           Output: t2_1.c
+                           ->  Seq Scan on public.ptab1_p2 t2_1
+                                 Output: t2_1.c
+         ->  HashAggregate
+               Output: t1_2.c, sum(t1_2.a), avg(t1_2.b), count(*)
+               Group Key: t1_2.c
+               ->  Hash Join
+                     Output: t1_2.c, t1_2.a, t1_2.b
+                     Hash Cond: (t1_2.c = t2_2.c)
+                     ->  Seq Scan on public.ptab1_p3 t1_2
+                           Output: t1_2.c, t1_2.a, t1_2.b
+                     ->  Hash
+                           Output: t2_2.c
+                           ->  Seq Scan on public.ptab1_p3 t2_2
+                                 Output: t2_2.c
+(40 rows)
+
+SELECT t1.c, sum(t1.a), avg(t1.b), COUNT(*) FROM ptab1 t1, ptab1 t2 WHERE t1.c = t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
+NOTICE:  partition-wise grouping is possible.
+  c   |  sum   |         avg          | count 
+------+--------+----------------------+-------
+ 0000 |  15000 |  24.0000000000000000 |   625
+ 0001 |  46250 |  74.0000000000000000 |   625
+ 0002 |  77500 | 124.0000000000000000 |   625
+ 0003 | 108750 | 174.0000000000000000 |   625
+ 0004 | 140000 | 224.0000000000000000 |   625
+ 0005 | 171250 | 274.0000000000000000 |   625
+ 0006 | 202500 | 324.0000000000000000 |   625
+ 0007 | 233750 | 374.0000000000000000 |   625
+ 0008 | 265000 | 424.0000000000000000 |   625
+ 0009 | 296250 | 474.0000000000000000 |   625
+ 0010 | 327500 | 524.0000000000000000 |   625
+ 0011 | 358750 | 574.0000000000000000 |   625
+(12 rows)
+
+SELECT t1.c, sum(t1.a), avg(t1.b), COUNT(*) FROM uptab1 t1, uptab1 t2 WHERE t1.c = t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
+  c   |  sum   |         avg          | count 
+------+--------+----------------------+-------
+ 0000 |  15000 |  24.0000000000000000 |   625
+ 0001 |  46250 |  74.0000000000000000 |   625
+ 0002 |  77500 | 124.0000000000000000 |   625
+ 0003 | 108750 | 174.0000000000000000 |   625
+ 0004 | 140000 | 224.0000000000000000 |   625
+ 0005 | 171250 | 274.0000000000000000 |   625
+ 0006 | 202500 | 324.0000000000000000 |   625
+ 0007 | 233750 | 374.0000000000000000 |   625
+ 0008 | 265000 | 424.0000000000000000 |   625
+ 0009 | 296250 | 474.0000000000000000 |   625
+ 0010 | 327500 | 524.0000000000000000 |   625
+ 0011 | 358750 | 574.0000000000000000 |   625
+(12 rows)
+
+-- Negative testcase
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT COUNT(*) FROM ptab1 GROUP BY a;
+               QUERY PLAN                
+-----------------------------------------
+ HashAggregate
+   Output: count(*), ptab1.a
+   Group Key: ptab1.a
+   ->  Append
+         ->  Seq Scan on public.ptab1
+               Output: ptab1.a
+         ->  Seq Scan on public.ptab1_p1
+               Output: ptab1_p1.a
+         ->  Seq Scan on public.ptab1_p2
+               Output: ptab1_p2.a
+         ->  Seq Scan on public.ptab1_p3
+               Output: ptab1_p3.a
+(12 rows)
+
+-- Cleanup
+DROP TABLE uptab1;
+DROP TABLE ptab1_p3;
+DROP TABLE ptab1_p2;
+DROP TABLE ptab1_p1;
+DROP TABLE ptab1;
+RESET enable_partition_wise_join;
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 27f09fa..67c2041 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -1399,44 +1399,49 @@ ANALYZE plt1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+NOTICE:  partition-wise grouping is possible.
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  Sort
    Sort Key: t1.c, t3.c
-   ->  HashAggregate
-         Group Key: t1.c, t2.c, t3.c
-         ->  Result
-               ->  Append
-                     ->  Hash Join
-                           Hash Cond: (t1.c = t2.c)
-                           ->  Seq Scan on plt1_p1 t1
-                           ->  Hash
-                                 ->  Hash Join
-                                       Hash Cond: (t2.c = ltrim(t3.c, 'A'::text))
-                                       ->  Seq Scan on plt2_p1 t2
-                                       ->  Hash
-                                             ->  Seq Scan on plt1_e_p1 t3
-                     ->  Hash Join
-                           Hash Cond: (t1_1.c = t2_1.c)
-                           ->  Seq Scan on plt1_p2 t1_1
-                           ->  Hash
-                                 ->  Hash Join
-                                       Hash Cond: (t2_1.c = ltrim(t3_1.c, 'A'::text))
-                                       ->  Seq Scan on plt2_p2 t2_1
-                                       ->  Hash
-                                             ->  Seq Scan on plt1_e_p2 t3_1
-                     ->  Hash Join
-                           Hash Cond: (t1_2.c = t2_2.c)
-                           ->  Seq Scan on plt1_p3 t1_2
-                           ->  Hash
-                                 ->  Hash Join
-                                       Hash Cond: (t2_2.c = ltrim(t3_2.c, 'A'::text))
-                                       ->  Seq Scan on plt2_p3 t2_2
-                                       ->  Hash
-                                             ->  Seq Scan on plt1_e_p3 t3_2
-(33 rows)
+   ->  Append
+         ->  HashAggregate
+               Group Key: t1.c, t2.c, t3.c
+               ->  Hash Join
+                     Hash Cond: (t1.c = t2.c)
+                     ->  Seq Scan on plt1_p1 t1
+                     ->  Hash
+                           ->  Hash Join
+                                 Hash Cond: (t2.c = ltrim(t3.c, 'A'::text))
+                                 ->  Seq Scan on plt2_p1 t2
+                                 ->  Hash
+                                       ->  Seq Scan on plt1_e_p1 t3
+         ->  HashAggregate
+               Group Key: t1_1.c, t2_1.c, t3_1.c
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = t2_1.c)
+                     ->  Seq Scan on plt1_p2 t1_1
+                     ->  Hash
+                           ->  Hash Join
+                                 Hash Cond: (t2_1.c = ltrim(t3_1.c, 'A'::text))
+                                 ->  Seq Scan on plt2_p2 t2_1
+                                 ->  Hash
+                                       ->  Seq Scan on plt1_e_p2 t3_1
+         ->  HashAggregate
+               Group Key: t1_2.c, t2_2.c, t3_2.c
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = t2_2.c)
+                     ->  Seq Scan on plt1_p3 t1_2
+                     ->  Hash
+                           ->  Hash Join
+                                 Hash Cond: (t2_2.c = ltrim(t3_2.c, 'A'::text))
+                                 ->  Seq Scan on plt2_p3 t2_2
+                                 ->  Hash
+                                       ->  Seq Scan on plt1_e_p3 t3_2
+(36 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
+NOTICE:  partition-wise grouping is possible.
          avg          |         avg          |          avg          |  c   |  c   |   c   
 ----------------------+----------------------+-----------------------+------+------+-------
   24.0000000000000000 |  24.0000000000000000 |   48.0000000000000000 | 0000 | 0000 | A0000
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index cd1f7f3..f4cd466 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -81,11 +81,12 @@ select name, setting from pg_settings where name like 'enable%';
  enable_material            | on
  enable_mergejoin           | on
  enable_nestloop            | on
+ enable_partition_wise_agg  | on
  enable_partition_wise_join | off
  enable_seqscan             | on
  enable_sort                | on
  enable_tidscan             | on
-(13 rows)
+(14 rows)
 
 -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
 -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 966984d..56c07d3 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -104,6 +104,10 @@ test: publication subscription
 # Another group of parallel tests
 # ----------
 test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass partition_join multi_level_partition_join
+# TODO: should be added in parallel tests above, but before that need to make
+# sure we have unique objects to avoid any concurrency issues.
+test: partition_agg
+
 # ----------
 # Another group of parallel tests
 # NB: temp.sql does a reconnect which transiently uses 2 connections,
diff --git a/src/test/regress/sql/partition_agg.sql b/src/test/regress/sql/partition_agg.sql
new file mode 100644
index 0000000..df6dd9c
--- /dev/null
+++ b/src/test/regress/sql/partition_agg.sql
@@ -0,0 +1,48 @@
+--
+-- PARTITION_AGG
+-- Test partition-wise aggregation on partitioned tables
+--
+
+-- Enable partition-wise join, which by default is disabled.
+SET enable_partition_wise_join TO true;
+
+--
+-- Tests for list partitioned tables.
+--
+CREATE TABLE ptab1 (a int, b int, c text) PARTITION BY LIST(c);
+CREATE TABLE ptab1_p1 PARTITION OF ptab1 FOR VALUES IN ('0000', '0003', '0004', '0010');
+CREATE TABLE ptab1_p2 PARTITION OF ptab1 FOR VALUES IN ('0001', '0005', '0002', '0009');
+CREATE TABLE ptab1_p3 PARTITION OF ptab1 FOR VALUES IN ('0006', '0007', '0008', '0011');
+INSERT INTO ptab1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
+ANALYZE ptab1;
+ANALYZE ptab1_p1;
+ANALYZE ptab1_p2;
+ANALYZE ptab1_p3;
+-- TODO: This table is created only for testing the results. Remove once
+-- results are tested.
+CREATE TABLE uptab1 AS SELECT * FROM ptab1;
+ANALYZE uptab1;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT c, sum(a), avg(b), COUNT(*) FROM ptab1 GROUP BY c ORDER BY 1, 2, 3;
+SELECT c, sum(a), avg(b), COUNT(*) FROM ptab1 GROUP BY c ORDER BY 1, 2, 3;
+SELECT c, sum(a), avg(b), COUNT(*) FROM uptab1 GROUP BY c ORDER BY 1, 2, 3;
+
+-- JOIN query
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.c, sum(t1.a), avg(t1.b), COUNT(*) FROM ptab1 t1, ptab1 t2 WHERE t1.c = t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
+SELECT t1.c, sum(t1.a), avg(t1.b), COUNT(*) FROM ptab1 t1, ptab1 t2 WHERE t1.c = t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
+SELECT t1.c, sum(t1.a), avg(t1.b), COUNT(*) FROM uptab1 t1, uptab1 t2 WHERE t1.c = t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
+
+
+-- Negative testcase
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT COUNT(*) FROM ptab1 GROUP BY a;
+
+-- Cleanup
+DROP TABLE uptab1;
+DROP TABLE ptab1_p3;
+DROP TABLE ptab1_p2;
+DROP TABLE ptab1_p1;
+DROP TABLE ptab1;
+RESET enable_partition_wise_join;

Antonin Houska

ah@cybertec.at

almost 9 years ago

In reply to: Jeevan Chalke (#1)

Re: Partition-wise aggregation/grouping

Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

Declarative partitioning is supported in PostgreSQL 10 and work is already in
progress to support partition-wise joins. Here is a proposal for partition-wise
aggregation/grouping. Our initial performance measurement has shown 7 times
performance when partitions are on foreign servers and approximately 15% when
partitions are local.

Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing aggregates
to be computed completely for each partition. Otherwise, partial aggregates
computed for each partition are combined across the partitions to produce the
final aggregates. This technique improves performance because:

i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.

ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.

iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.

I suspect this overlaps with

/messages/by-id/29111.1483984605@localhost

I'm working on the next version of the patch, which will be able to aggregate
the result of both base relation scans and joins. I'm trying hard to make the
next version available before an urgent vacation that I'll have to take at
random date between today and early April. I suggest that we coordinate the
effort, it's lot of work in any case.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 9 years ago

In reply to: Antonin Houska (#2)

Re: Partition-wise aggregation/grouping

On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska <ah@cybertec.at> wrote:

Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

Declarative partitioning is supported in PostgreSQL 10 and work is

already in

progress to support partition-wise joins. Here is a proposal for

partition-wise

aggregation/grouping. Our initial performance measurement has shown 7

times

performance when partitions are on foreign servers and approximately 15%

when

partitions are local.

Partition-wise aggregation/grouping computes aggregates for each

partition

separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing

aggregates

to be computed completely for each partition. Otherwise, partial

aggregates

computed for each partition are combined across the partitions to

produce the

final aggregates. This technique improves performance because:

i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.

ii. If hash table for each partition fits in memory, but that for the

whole

relation does not, each partition-wise aggregate can use an in-memory

hash

table.

iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.

I suspect this overlaps with

/messages/by-id/29111.1483984605@localhost

I'm working on the next version of the patch, which will be able to
aggregate
the result of both base relation scans and joins. I'm trying hard to make
the
next version available before an urgent vacation that I'll have to take at
random date between today and early April. I suggest that we coordinate the
effort, it's lot of work in any case.

IIUC, it seems that you are trying to push down the aggregation into the
joining relations. So basically you are converting
Agg -> Join -> {scan1, scan2} into
FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
In addition to that your patch pushes aggregates on base rel to its
children,
if any.

Where as what I propose here is pushing down aggregation below the append
node keeping join/scan as is. So basically I am converting
Agg -> Append-> Join -> {scan1, scan2} into
Append -> Agg -> Join -> {scan1, scan2}.
This will require partition-wise join as posted in [1]/messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com.
But I am planning to make this work for partitioned relations and not for
generic inheritance.

I treat these two as separate strategies/paths to be consider while
planning.

Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.

I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these
task,
then we may need to find a way to integrate them.

[1]: /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com
/messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Antonin Houska

ah@cybertec.at

almost 9 years ago

In reply to: Jeevan Chalke (#3)

Re: Partition-wise aggregation/grouping

The promised new version of my patch is here:

/messages/by-id/9666.1491295317@localhost

Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Tue, Mar 21, 2017 at 1:47 PM, Antonin Houska <ah@cybertec.at> wrote:

Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

IIUC, it seems that you are trying to push down the aggregation into the
joining relations. So basically you are converting
Agg -> Join -> {scan1, scan2} into
FinalAgg -> Join -> {PartialAgg -> scan1, PartialAgg -> scan2}.
In addition to that your patch pushes aggregates on base rel to its children,
if any.

Where as what I propose here is pushing down aggregation below the append
node keeping join/scan as is. So basically I am converting
Agg -> Append-> Join -> {scan1, scan2} into
Append -> Agg -> Join -> {scan1, scan2}.
This will require partition-wise join as posted in [1].
But I am planning to make this work for partitioned relations and not for
generic inheritance.

I treat these two as separate strategies/paths to be consider while planning.

Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.

I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these task,
then we may need to find a way to integrate them.

[1] /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com

My patch does also create (partial) aggregation paths below the Append node,
but only expects SeqScan as input. Please check if you patch can be based on
this or if there's any conflict.

(I'll probably be unable to respond before Monday 04/17.)

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Antonin Houska

ah@cybertec.at

almost 9 years ago

In reply to: Antonin Houska (#4)

Re: Partition-wise aggregation/grouping

Antonin Houska <ah@cybertec.at> wrote:

Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.

I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these task,
then we may need to find a way to integrate them.

[1] /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com

My patch does also create (partial) aggregation paths below the Append node,
but only expects SeqScan as input. Please check if you patch can be based on
this or if there's any conflict.

Well, I haven't imposed any explicit restriction on the kind of path to be
aggregated below the Append path. Maybe the only thing to do is to merge my
patch with the "partition-wise join" patch (which I haven't checked yet).

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Antonin Houska

ah@cybertec.at

over 8 years ago

In reply to: Antonin Houska (#5)

3 attachment(s)

Re: Partition-wise aggregation/grouping

Antonin Houska <ah@cybertec.at> wrote:

Antonin Houska <ah@cybertec.at> wrote:

Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

Our work will overlap when we are pushing down the aggregate on partitioned
base relation to its children/partitions.

I think you should continue working on pushing down aggregate onto the
joins/scans where as I will continue my work on pushing down aggregates to
partitions (joins as well as single table). Once we are done with these task,
then we may need to find a way to integrate them.

[1] /messages/by-id/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=EaDTSA@mail.gmail.com

My patch does also create (partial) aggregation paths below the Append node,
but only expects SeqScan as input. Please check if you patch can be based on
this or if there's any conflict.

Well, I haven't imposed any explicit restriction on the kind of path to be
aggregated below the Append path. Maybe the only thing to do is to merge my
patch with the "partition-wise join" patch (which I haven't checked yet).

Attached is a diff that contains both patches merged. This is just to prove my
assumption, details to be elaborated later. The scripts attached produce the
following plan in my environment:

QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2

Note that I had no better idea how to enforce the plan than hard-wiring zero
costs of the partial aggregation paths. This simulates the use case of partial
aggregation performed on remote node (postgres_fdw). Other use cases may
exist, but I only wanted to prove the concept in terms of coding so far.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Attachments:

agg_pushdown_partition_wise.difftext/x-diffDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
new file mode 100644
index b29549a..418c59a
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
*************** AND ftoptions @> array['fetch_size=60000
*** 7219,7221 ****
--- 7219,7341 ----
  (1 row)
  
  ROLLBACK;
+ -- ===================================================================
+ -- test partition-wise-joins
+ -- ===================================================================
+ SET enable_partition_wise_join=on;
+ CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
+ CREATE TABLE fprt1_p1 (LIKE fprt1);
+ CREATE TABLE fprt1_p2 (LIKE fprt1);
+ INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
+ INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
+ CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt1_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2');
+ ANALYZE fprt1;
+ ANALYZE fprt1_p1;
+ ANALYZE fprt1_p2;
+ CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
+ CREATE TABLE fprt2_p1 (LIKE fprt2);
+ CREATE TABLE fprt2_p2 (LIKE fprt2);
+ INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
+ INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
+ CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
+ ANALYZE fprt2;
+ ANALYZE fprt2_p1;
+ ANALYZE fprt2_p2;
+ -- inner join three tables
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+                                                      QUERY PLAN                                                     
+ --------------------------------------------------------------------------------------------------------------------
+  Sort
+    Sort Key: t1.a, t3.c
+    ->  Append
+          ->  Foreign Scan
+                Relations: ((public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)) INNER JOIN (public.ftprt1_p1 t3)
+          ->  Foreign Scan
+                Relations: ((public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)) INNER JOIN (public.ftprt1_p2 t3)
+ (7 rows)
+ 
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+   a  |  b  |  c   
+ -----+-----+------
+    0 |   0 | 0000
+  150 | 150 | 0003
+  250 | 250 | 0005
+  400 | 400 | 0008
+ (4 rows)
+ 
+ -- left outer join + nullable clasue
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+                                     QUERY PLAN                                     
+ -----------------------------------------------------------------------------------
+  Sort
+    Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c
+    ->  Append
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+ (5 rows)
+ 
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+  a | b |  c   
+ ---+---+------
+  0 | 0 | 0000
+  2 |   | 
+  4 |   | 
+  6 | 6 | 0000
+  8 |   | 
+ (5 rows)
+ 
+ -- with whole-row reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+                                    QUERY PLAN                                    
+ ---------------------------------------------------------------------------------
+  Sort
+    Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
+    ->  Append
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
+ (7 rows)
+ 
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+        t1       |       t2       
+ ----------------+----------------
+  (0,0,0000)     | (0,0,0000)
+  (150,150,0003) | (150,150,0003)
+  (250,250,0005) | (250,250,0005)
+  (400,400,0008) | (400,400,0008)
+ (4 rows)
+ 
+ -- join with lateral reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+                                    QUERY PLAN                                    
+ ---------------------------------------------------------------------------------
+  Sort
+    Sort Key: t1.a, t1.b
+    ->  Append
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
+ (7 rows)
+ 
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+   a  |  b  
+ -----+-----
+    0 |   0
+  150 | 150
+  250 | 250
+  400 | 400
+ (4 rows)
+ 
+ RESET enable_partition_wise_join;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
new file mode 100644
index 423eb02..a275f55
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
*************** WHERE ftrelid = 'table30000'::regclass
*** 1709,1711 ****
--- 1709,1764 ----
  AND ftoptions @> array['fetch_size=60000'];
  
  ROLLBACK;
+ 
+ -- ===================================================================
+ -- test partition-wise-joins
+ -- ===================================================================
+ SET enable_partition_wise_join=on;
+ 
+ CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
+ CREATE TABLE fprt1_p1 (LIKE fprt1);
+ CREATE TABLE fprt1_p2 (LIKE fprt1);
+ INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
+ INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
+ CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt1_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2');
+ ANALYZE fprt1;
+ ANALYZE fprt1_p1;
+ ANALYZE fprt1_p2;
+ 
+ CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
+ CREATE TABLE fprt2_p1 (LIKE fprt2);
+ CREATE TABLE fprt2_p2 (LIKE fprt2);
+ INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
+ INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
+ CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
+ ANALYZE fprt2;
+ ANALYZE fprt2_p1;
+ ANALYZE fprt2_p2;
+ 
+ -- inner join three tables
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+ 
+ -- left outer join + nullable clasue
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+ 
+ -- with whole-row reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+ 
+ -- join with lateral reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+ 
+ RESET enable_partition_wise_join;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index e02b0c8..c4d9228
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** ANY <replaceable class="parameter">num_s
*** 3643,3648 ****
--- 3643,3667 ----
        </listitem>
       </varlistentry>
  
+      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
+       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
+       <indexterm>
+        <primary><varname>enable_partition_wise_join</> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Enables or disables the query planner's use of partition-wise join
+         plans. When enabled, it spends time in creating paths for joins between
+         partitions and consumes memory to construct expression nodes to be used
+         for those joins, even if partition-wise join does not result in the
+         cheapest path. The time and memory increase exponentially with the
+         number of partitioned tables being joined and they increase linearly
+         with the number of partitions. The default is <literal>off</>.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
       <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
        <term><varname>enable_seqscan</varname> (<type>boolean</type>)
        <indexterm>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
new file mode 100644
index dbeaab5..ac8c2fa
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
*************** ShutdownForeignScan(ForeignScanState *no
*** 1270,1275 ****
--- 1270,1295 ----
     </para>
     </sect2>
  
+    <sect2 id="fdw-callbacks-reparameterize-paths">
+     <title>FDW Routines For reparameterization of paths</title>
+ 
+     <para>
+ <programlisting>
+ List *
+ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
+                                  RelOptInfo *child_rel);
+ </programlisting>
+     This function is called while converting a path parameterized by the
+     top-most parent of the given child relation <literal>child_rel</> to be
+     parameterized by the child relation. The function is used to reparameterize
+     any paths or translate any expression nodes saved in the given
+     <literal>fdw_private</> member of a <structname>ForeignPath</>. The
+     callback may use <literal>reparameterize_path_by_child</>,
+     <literal>adjust_appendrel_attrs</> or
+     <literal>adjust_appendrel_attrs_multilevel</> as required.
+     </para>
+    </sect2>
+ 
     </sect1>
  
     <sect1 id="fdw-helpers">
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
new file mode 100644
index e0d2665..c44bb0e
*** a/src/backend/catalog/partition.c
--- b/src/backend/catalog/partition.c
*************** static List *generate_partition_qual(Rel
*** 126,140 ****
  
  static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
  					 List *datums, bool lower);
! static int32 partition_rbound_cmp(PartitionKey key,
! 					 Datum *datums1, RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2);
! static int32 partition_rbound_datum_cmp(PartitionKey key,
! 						   Datum *rb_datums, RangeDatumContent *rb_content,
! 						   Datum *tuple_datums);
  
! static int32 partition_bound_cmp(PartitionKey key,
! 					PartitionBoundInfo boundinfo,
  					int offset, void *probe, bool probe_is_bound);
  static int partition_bound_bsearch(PartitionKey key,
  						PartitionBoundInfo boundinfo,
--- 126,141 ----
  
  static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
  					 List *datums, bool lower);
! static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
! 					 Oid *partcollation, Datum *datums1,
! 					 RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2);
! static int32 partition_rbound_datum_cmp(int partnatts, FmgrInfo *partsupfunc,
! 						   Oid *partcollation, Datum *rb_datums,
! 						   RangeDatumContent *rb_content, Datum *tuple_datums);
  
! static int32 partition_bound_cmp(int partnatts, FmgrInfo *partsupfunc,
! 					Oid *partcollation, PartitionBoundInfo boundinfo,
  					int offset, void *probe, bool probe_is_bound);
  static int partition_bound_bsearch(PartitionKey key,
  						PartitionBoundInfo boundinfo,
*************** RelationBuildPartitionDesc(Relation rel)
*** 592,598 ****
   * representation of partition bounds.
   */
  bool
! partition_bounds_equal(PartitionKey key,
  					   PartitionBoundInfo b1, PartitionBoundInfo b2)
  {
  	int			i;
--- 593,599 ----
   * representation of partition bounds.
   */
  bool
! partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
  					   PartitionBoundInfo b1, PartitionBoundInfo b2)
  {
  	int			i;
*************** partition_bounds_equal(PartitionKey key,
*** 613,619 ****
  	{
  		int			j;
  
! 		for (j = 0; j < key->partnatts; j++)
  		{
  			/* For range partitions, the bounds might not be finite. */
  			if (b1->content != NULL)
--- 614,620 ----
  	{
  		int			j;
  
! 		for (j = 0; j < partnatts; j++)
  		{
  			/* For range partitions, the bounds might not be finite. */
  			if (b1->content != NULL)
*************** partition_bounds_equal(PartitionKey key,
*** 642,649 ****
  			 * context.  datumIsEqual() should be simple enough to be safe.
  			 */
  			if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
! 							  key->parttypbyval[j],
! 							  key->parttyplen[j]))
  				return false;
  		}
  
--- 643,649 ----
  			 * context.  datumIsEqual() should be simple enough to be safe.
  			 */
  			if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
! 							  parttypbyval[j], parttyplen[j]))
  				return false;
  		}
  
*************** partition_bounds_equal(PartitionKey key,
*** 652,658 ****
  	}
  
  	/* There are ndatums+1 indexes in case of range partitions */
! 	if (key->strategy == PARTITION_STRATEGY_RANGE &&
  		b1->indexes[i] != b2->indexes[i])
  		return false;
  
--- 652,658 ----
  	}
  
  	/* There are ndatums+1 indexes in case of range partitions */
! 	if (b1->strategy == PARTITION_STRATEGY_RANGE &&
  		b1->indexes[i] != b2->indexes[i])
  		return false;
  
*************** check_new_partition_bound(char *relname,
*** 734,741 ****
  				 * First check if the resulting range would be empty with
  				 * specified lower and upper bounds
  				 */
! 				if (partition_rbound_cmp(key, lower->datums, lower->content, true,
! 										 upper) >= 0)
  					ereport(ERROR,
  							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
  					errmsg("cannot create range partition with empty range"),
--- 734,742 ----
  				 * First check if the resulting range would be empty with
  				 * specified lower and upper bounds
  				 */
! 				if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
! 										 key->partcollation, lower->datums,
! 										 lower->content, true, upper) >= 0)
  					ereport(ERROR,
  							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
  					errmsg("cannot create range partition with empty range"),
*************** qsort_partition_rbound_cmp(const void *a
*** 1865,1871 ****
  	PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
  	PartitionKey key = (PartitionKey) arg;
  
! 	return partition_rbound_cmp(key, b1->datums, b1->content, b1->lower, b2);
  }
  
  /*
--- 1866,1874 ----
  	PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
  	PartitionKey key = (PartitionKey) arg;
  
! 	return partition_rbound_cmp(key->partnatts, key->partsupfunc,
! 								key->partcollation, b1->datums, b1->content,
! 								b1->lower, b2);
  }
  
  /*
*************** qsort_partition_rbound_cmp(const void *a
*** 1875,1881 ****
   * content1, and lower1) is <=, =, >= the bound specified in *b2
   */
  static int32
! partition_rbound_cmp(PartitionKey key,
  					 Datum *datums1, RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2)
  {
--- 1878,1884 ----
   * content1, and lower1) is <=, =, >= the bound specified in *b2
   */
  static int32
! partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
  					 Datum *datums1, RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2)
  {
*************** partition_rbound_cmp(PartitionKey key,
*** 1885,1891 ****
  	RangeDatumContent *content2 = b2->content;
  	bool		lower2 = b2->lower;
  
! 	for (i = 0; i < key->partnatts; i++)
  	{
  		/*
  		 * First, handle cases involving infinity, which don't require
--- 1888,1894 ----
  	RangeDatumContent *content2 = b2->content;
  	bool		lower2 = b2->lower;
  
! 	for (i = 0; i < partnatts; i++)
  	{
  		/*
  		 * First, handle cases involving infinity, which don't require
*************** partition_rbound_cmp(PartitionKey key,
*** 1905,1912 ****
  		else if (content2[i] != RANGE_DATUM_FINITE)
  			return content2[i] == RANGE_DATUM_NEG_INF ? 1 : -1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
! 												 key->partcollation[i],
  												 datums1[i],
  												 datums2[i]));
  		if (cmpval != 0)
--- 1908,1915 ----
  		else if (content2[i] != RANGE_DATUM_FINITE)
  			return content2[i] == RANGE_DATUM_NEG_INF ? 1 : -1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
! 												 partcollation[i],
  												 datums1[i],
  												 datums2[i]));
  		if (cmpval != 0)
*************** partition_rbound_cmp(PartitionKey key,
*** 1932,1951 ****
   * rb_lower) <=, =, >= partition key of tuple (tuple_datums)
   */
  static int32
! partition_rbound_datum_cmp(PartitionKey key,
! 						   Datum *rb_datums, RangeDatumContent *rb_content,
! 						   Datum *tuple_datums)
  {
  	int			i;
  	int32		cmpval = -1;
  
! 	for (i = 0; i < key->partnatts; i++)
  	{
  		if (rb_content[i] != RANGE_DATUM_FINITE)
  			return rb_content[i] == RANGE_DATUM_NEG_INF ? -1 : 1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
! 												 key->partcollation[i],
  												 rb_datums[i],
  												 tuple_datums[i]));
  		if (cmpval != 0)
--- 1935,1954 ----
   * rb_lower) <=, =, >= partition key of tuple (tuple_datums)
   */
  static int32
! partition_rbound_datum_cmp(int partnatts, FmgrInfo *partsupfunc,
! 						   Oid *partcollation, Datum *rb_datums,
! 						   RangeDatumContent *rb_content, Datum *tuple_datums)
  {
  	int			i;
  	int32		cmpval = -1;
  
! 	for (i = 0; i < partnatts; i++)
  	{
  		if (rb_content[i] != RANGE_DATUM_FINITE)
  			return rb_content[i] == RANGE_DATUM_NEG_INF ? -1 : 1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
! 												 partcollation[i],
  												 rb_datums[i],
  												 tuple_datums[i]));
  		if (cmpval != 0)
*************** partition_rbound_datum_cmp(PartitionKey
*** 1962,1978 ****
   * specified in *probe.
   */
  static int32
! partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
! 					int offset, void *probe, bool probe_is_bound)
  {
  	Datum	   *bound_datums = boundinfo->datums[offset];
  	int32		cmpval = -1;
  
! 	switch (key->strategy)
  	{
  		case PARTITION_STRATEGY_LIST:
! 			cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
! 													 key->partcollation[0],
  													 bound_datums[0],
  													 *(Datum *) probe));
  			break;
--- 1965,1982 ----
   * specified in *probe.
   */
  static int32
! partition_bound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
! 					PartitionBoundInfo boundinfo, int offset, void *probe,
! 					bool probe_is_bound)
  {
  	Datum	   *bound_datums = boundinfo->datums[offset];
  	int32		cmpval = -1;
  
! 	switch (boundinfo->strategy)
  	{
  		case PARTITION_STRATEGY_LIST:
! 			cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
! 													 partcollation[0],
  													 bound_datums[0],
  													 *(Datum *) probe));
  			break;
*************** partition_bound_cmp(PartitionKey key, Pa
*** 1990,2001 ****
  					 */
  					bool		lower = boundinfo->indexes[offset] < 0;
  
! 					cmpval = partition_rbound_cmp(key,
! 												bound_datums, content, lower,
! 											  (PartitionRangeBound *) probe);
  				}
  				else
! 					cmpval = partition_rbound_datum_cmp(key,
  														bound_datums, content,
  														(Datum *) probe);
  				break;
--- 1994,2007 ----
  					 */
  					bool		lower = boundinfo->indexes[offset] < 0;
  
! 					cmpval = partition_rbound_cmp(partnatts, partsupfunc,
! 												  partcollation, bound_datums,
! 												  content, lower,
! 												(PartitionRangeBound *) probe);
  				}
  				else
! 					cmpval = partition_rbound_datum_cmp(partnatts, partsupfunc,
! 														partcollation,
  														bound_datums, content,
  														(Datum *) probe);
  				break;
*************** partition_bound_cmp(PartitionKey key, Pa
*** 2003,2009 ****
  
  		default:
  			elog(ERROR, "unexpected partition strategy: %d",
! 				 (int) key->strategy);
  	}
  
  	return cmpval;
--- 2009,2015 ----
  
  		default:
  			elog(ERROR, "unexpected partition strategy: %d",
! 				 (int) boundinfo->strategy);
  	}
  
  	return cmpval;
*************** partition_bound_bsearch(PartitionKey key
*** 2037,2043 ****
  		int32		cmpval;
  
  		mid = (lo + hi + 1) / 2;
! 		cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
  									 probe_is_bound);
  		if (cmpval <= 0)
  		{
--- 2043,2050 ----
  		int32		cmpval;
  
  		mid = (lo + hi + 1) / 2;
! 		cmpval = partition_bound_cmp(key->partnatts, key->partsupfunc,
! 									 key->partcollation, boundinfo, mid, probe,
  									 probe_is_bound);
  		if (cmpval <= 0)
  		{
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
new file mode 100644
index 5a34a46..717763d
*** a/src/backend/executor/execExpr.c
--- b/src/backend/executor/execExpr.c
*************** ExecInitExprRec(Expr *node, PlanState *p
*** 723,728 ****
--- 723,755 ----
  				break;
  			}
  
+ 		case T_GroupedVar:
+ 			/*
+ 			 * GroupedVar is treated as an aggregate if it appears in the
+ 			 * targetlist of Agg node, but as a normal variable elsewhere.
+ 			 */
+ 			if (parent && (IsA(parent, AggState)))
+ 			{
+ 				GroupedVar *gvar = (GroupedVar *) node;
+ 
+ 				/*
+ 				 * Currently GroupedVar can only represent partial aggregate.
+ 				 */
+ 				Assert(gvar->agg_partial != NULL);
+ 
+ 				ExecInitExprRec((Expr *) gvar->agg_partial, parent, state,
+ 								resv, resnull);
+ 				break;
+ 			}
+ 			else
+ 			{
+ 				/*
+ 				 * set_plan_refs should have replaced GroupedVar in the
+ 				 * targetlist with an ordinary Var.
+ 				 */
+ 				elog(ERROR, "parent of GroupedVar is not Agg node");
+ 			}
+ 
  		case T_GroupingFunc:
  			{
  				GroupingFunc *grp_node = (GroupingFunc *) node;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
new file mode 100644
index c2b8618..c4cb4c0
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** find_unaggregated_cols_walker(Node *node
*** 1829,1834 ****
--- 1829,1845 ----
  		/* do not descend into aggregate exprs */
  		return false;
  	}
+ 	if (IsA(node, GroupedVar))
+ 	{
+ 		GroupedVar	   *gvar = (GroupedVar *) node;
+ 
+ 		/*
+ 		 * GroupedVar is currently used only for partial aggregation, so treat
+ 		 * it like an Aggref above.
+ 		 */
+ 		Assert(gvar->agg_partial != NULL);
+ 		return false;
+ 	}
  	return expression_tree_walker(node, find_unaggregated_cols_walker,
  								  (void *) colnos);
  }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
new file mode 100644
index 00a0fed..7d188ea
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*************** _copyPlaceHolderVar(const PlaceHolderVar
*** 2206,2211 ****
--- 2206,2226 ----
  }
  
  /*
+  * _copyGroupedVar
+  */
+ static GroupedVar *
+ _copyGroupedVar(const GroupedVar *from)
+ {
+ 	GroupedVar *newnode = makeNode(GroupedVar);
+ 
+ 	COPY_NODE_FIELD(gvexpr);
+ 	COPY_NODE_FIELD(agg_partial);
+ 	COPY_SCALAR_FIELD(gvid);
+ 
+ 	return newnode;
+ }
+ 
+ /*
   * _copySpecialJoinInfo
   */
  static SpecialJoinInfo *
*************** copyObjectImpl(const void *from)
*** 4984,4989 ****
--- 4999,5007 ----
  		case T_PlaceHolderVar:
  			retval = _copyPlaceHolderVar(from);
  			break;
+ 		case T_GroupedVar:
+ 			retval = _copyGroupedVar(from);
+ 			break;
  		case T_SpecialJoinInfo:
  			retval = _copySpecialJoinInfo(from);
  			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
new file mode 100644
index 46573ae..f1dacd5
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*************** _equalPlaceHolderVar(const PlaceHolderVa
*** 874,879 ****
--- 874,887 ----
  }
  
  static bool
+ _equalGroupedVar(const GroupedVar *a, const GroupedVar *b)
+ {
+ 	COMPARE_SCALAR_FIELD(gvid);
+ 
+ 	return true;
+ }
+ 
+ static bool
  _equalSpecialJoinInfo(const SpecialJoinInfo *a, const SpecialJoinInfo *b)
  {
  	COMPARE_BITMAPSET_FIELD(min_lefthand);
*************** equal(const void *a, const void *b)
*** 3148,3153 ****
--- 3156,3164 ----
  		case T_PlaceHolderVar:
  			retval = _equalPlaceHolderVar(a, b);
  			break;
+ 		case T_GroupedVar:
+ 			retval = _equalGroupedVar(a, b);
+ 			break;
  		case T_SpecialJoinInfo:
  			retval = _equalSpecialJoinInfo(a, b);
  			break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
new file mode 100644
index 3e8189c..5c00e55
*** a/src/backend/nodes/nodeFuncs.c
--- b/src/backend/nodes/nodeFuncs.c
*************** exprType(const Node *expr)
*** 259,264 ****
--- 259,267 ----
  		case T_PlaceHolderVar:
  			type = exprType((Node *) ((const PlaceHolderVar *) expr)->phexpr);
  			break;
+ 		case T_GroupedVar:
+ 			type = exprType((Node *) ((const GroupedVar *) expr)->agg_partial);
+ 			break;
  		default:
  			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
  			type = InvalidOid;	/* keep compiler quiet */
*************** exprCollation(const Node *expr)
*** 931,936 ****
--- 934,942 ----
  		case T_PlaceHolderVar:
  			coll = exprCollation((Node *) ((const PlaceHolderVar *) expr)->phexpr);
  			break;
+ 		case T_GroupedVar:
+ 			coll = exprCollation((Node *) ((const GroupedVar *) expr)->gvexpr);
+ 			break;
  		default:
  			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
  			coll = InvalidOid;	/* keep compiler quiet */
*************** expression_tree_walker(Node *node,
*** 2198,2203 ****
--- 2204,2211 ----
  			break;
  		case T_PlaceHolderVar:
  			return walker(((PlaceHolderVar *) node)->phexpr, context);
+ 		case T_GroupedVar:
+ 			return walker(((GroupedVar *) node)->gvexpr, context);
  		case T_InferenceElem:
  			return walker(((InferenceElem *) node)->expr, context);
  		case T_AppendRelInfo:
*************** expression_tree_mutator(Node *node,
*** 2989,2994 ****
--- 2997,3012 ----
  				return (Node *) newnode;
  			}
  			break;
+ 		case T_GroupedVar:
+ 			{
+ 				GroupedVar *gv = (GroupedVar *) node;
+ 				GroupedVar *newnode;
+ 
+ 				FLATCOPY(newnode, gv, GroupedVar);
+ 				MUTATE(newnode->gvexpr, gv->gvexpr, Expr *);
+ 				MUTATE(newnode->agg_partial, gv->agg_partial, Aggref *);
+ 				return (Node *) newnode;
+ 			}
  		case T_InferenceElem:
  			{
  				InferenceElem *inferenceelemdexpr = (InferenceElem *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
new file mode 100644
index 28cef85..4b6ee30
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outPlannerInfo(StringInfo str, const Pl
*** 2186,2191 ****
--- 2186,2192 ----
  	WRITE_NODE_FIELD(pcinfo_list);
  	WRITE_NODE_FIELD(rowMarks);
  	WRITE_NODE_FIELD(placeholder_list);
+ 	WRITE_NODE_FIELD(grouped_var_list);
  	WRITE_NODE_FIELD(fkey_list);
  	WRITE_NODE_FIELD(query_pathkeys);
  	WRITE_NODE_FIELD(group_pathkeys);
*************** _outParamPathInfo(StringInfo str, const
*** 2408,2413 ****
--- 2409,2424 ----
  }
  
  static void
+ _outGroupedPathInfo(StringInfo str, const GroupedPathInfo *node)
+ {
+ 	WRITE_NODE_TYPE("GROUPEDPATHINFO");
+ 
+ 	WRITE_NODE_FIELD(target);
+ 	WRITE_NODE_FIELD(pathlist);
+ 	WRITE_NODE_FIELD(partial_pathlist);
+ }
+ 
+ static void
  _outRestrictInfo(StringInfo str, const RestrictInfo *node)
  {
  	WRITE_NODE_TYPE("RESTRICTINFO");
*************** _outPlaceHolderVar(StringInfo str, const
*** 2451,2456 ****
--- 2462,2477 ----
  }
  
  static void
+ _outGroupedVar(StringInfo str, const GroupedVar *node)
+ {
+ 	WRITE_NODE_TYPE("GROUPEDVAR");
+ 
+ 	WRITE_NODE_FIELD(gvexpr);
+ 	WRITE_NODE_FIELD(agg_partial);
+ 	WRITE_UINT_FIELD(gvid);
+ }
+ 
+ static void
  _outSpecialJoinInfo(StringInfo str, const SpecialJoinInfo *node)
  {
  	WRITE_NODE_TYPE("SPECIALJOININFO");
*************** outNode(StringInfo str, const void *obj)
*** 3996,4007 ****
--- 4017,4034 ----
  			case T_ParamPathInfo:
  				_outParamPathInfo(str, obj);
  				break;
+ 			case T_GroupedPathInfo:
+ 				_outGroupedPathInfo(str, obj);
+ 				break;
  			case T_RestrictInfo:
  				_outRestrictInfo(str, obj);
  				break;
  			case T_PlaceHolderVar:
  				_outPlaceHolderVar(str, obj);
  				break;
+ 			case T_GroupedVar:
+ 				_outGroupedVar(str, obj);
+ 				break;
  			case T_SpecialJoinInfo:
  				_outSpecialJoinInfo(str, obj);
  				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
new file mode 100644
index a883220..138f71c
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
*************** _readVar(void)
*** 522,527 ****
--- 522,542 ----
  }
  
  /*
+  * _readGroupedVar
+  */
+ static GroupedVar *
+ _readGroupedVar(void)
+ {
+ 	READ_LOCALS(GroupedVar);
+ 
+ 	READ_NODE_FIELD(gvexpr);
+ 	READ_NODE_FIELD(agg_partial);
+ 	READ_UINT_FIELD(gvid);
+ 
+ 	READ_DONE();
+ }
+ 
+ /*
   * _readConst
   */
  static Const *
*************** parseNodeString(void)
*** 2440,2445 ****
--- 2455,2462 ----
  		return_value = _readTableFunc();
  	else if (MATCH("VAR", 3))
  		return_value = _readVar();
+ 	else if (MATCH("GROUPEDVAR", 10))
+ 		return_value = _readGroupedVar();
  	else if (MATCH("CONST", 5))
  		return_value = _readConst();
  	else if (MATCH("PARAM", 5))
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
new file mode 100644
index fc0fca4..eee093f
*** a/src/backend/optimizer/README
--- b/src/backend/optimizer/README
*************** be desirable to postpone the Gather stag
*** 1076,1078 ****
--- 1076,1105 ----
  plan as possible.  Expanding the range of cases in which more work can be
  pushed below the Gather (and costing them accurately) is likely to keep us
  busy for a long time to come.
+ 
+ Partition-wise joins
+ --------------------
+ A join between two similarly partitioned tables can be broken down into joins
+ between their matching partitions if there exists an equi-join condition
+ between the partition keys of the joining tables. The equi-join between
+ partition keys implies that all join partners for a given row in one
+ partitioned table must be in the corresponding partition of the other
+ partitioned table. The join partners can not be found in other partitions. This
+ condition allows the join between partitioned tables to be broken into joins
+ between the matching partitions. The resultant join is partitioned in the same
+ way as the joining relations, thus allowing an N-way join between similarly
+ partitioned tables having equi-join condition between their partition keys to
+ be broken down into N-way joins between their matching partitions. This
+ technique of breaking down a join between partition tables into join between
+ their partitions is called partition-wise join. We will use term "partitioned
+ relation" for both partitioned table as well as join between partitioned tables
+ which can use partition-wise join technique.
+ 
+ Partitioning properties of a partitioned table are stored in
+ PartitionSchemeData structure. Planner maintains a list of canonical partition
+ schemes (distinct PartitionSchemeData objects) so that any two partitioned
+ relations with same partitioning scheme share the same PartitionSchemeData
+ object. This reduces memory consumed by PartitionSchemeData objects and makes
+ it easy to compare the partition schemes of joining relations. RelOptInfos of
+ partitioned relations hold partition key expressions and the RelOptInfos of
+ the partition relations of that relation.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
new file mode 100644
index b5cab0c..1ad910d
*** a/src/backend/optimizer/geqo/geqo_eval.c
--- b/src/backend/optimizer/geqo/geqo_eval.c
*************** merge_clump(PlannerInfo *root, List *clu
*** 264,271 ****
  			/* Keep searching if join order is not valid */
  			if (joinrel)
  			{
  				/* Create GatherPaths for any useful partial paths for rel */
! 				generate_gather_paths(root, joinrel);
  
  				/* Find and save the cheapest paths for this joinrel */
  				set_cheapest(joinrel);
--- 264,279 ----
  			/* Keep searching if join order is not valid */
  			if (joinrel)
  			{
+ 
+ 				/*
+ 				 * Create "append" paths for partitioned joins. Do this before
+ 				 * creating GatherPaths so that partial "append" paths in
+ 				 * partitioned joins will be considered.
+ 				 */
+ 				generate_partition_wise_join_paths(root, joinrel);
+ 
  				/* Create GatherPaths for any useful partial paths for rel */
! 				generate_gather_paths(root, joinrel, false);
  
  				/* Find and save the cheapest paths for this joinrel */
  				set_cheapest(joinrel);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
new file mode 100644
index b93b4fc..83a2c37
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***************
*** 24,29 ****
--- 24,30 ----
  #include "catalog/pg_operator.h"
  #include "catalog/pg_proc.h"
  #include "foreign/fdwapi.h"
+ #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #ifdef OPTIMIZER_DEBUG
*************** set_rel_pathlist(PlannerInfo *root, RelO
*** 486,492 ****
  	 * we'll consider gathering partial paths for the parent appendrel.)
  	 */
  	if (rel->reloptkind == RELOPT_BASEREL)
! 		generate_gather_paths(root, rel);
  
  	/*
  	 * Allow a plugin to editorialize on the set of Paths for this base
--- 487,496 ----
  	 * we'll consider gathering partial paths for the parent appendrel.)
  	 */
  	if (rel->reloptkind == RELOPT_BASEREL)
! 	{
! 		generate_gather_paths(root, rel, false);
! 		generate_gather_paths(root, rel, true);
! 	}
  
  	/*
  	 * Allow a plugin to editorialize on the set of Paths for this base
*************** static void
*** 686,691 ****
--- 690,696 ----
  set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  {
  	Relids		required_outer;
+ 	Path		*seq_path;
  
  	/*
  	 * We don't support pushing join clauses into the quals of a seqscan, but
*************** set_plain_rel_pathlist(PlannerInfo *root
*** 694,708 ****
  	 */
  	required_outer = rel->lateral_relids;
  
! 	/* Consider sequential scan */
! 	add_path(rel, create_seqscan_path(root, rel, required_outer, 0));
  
! 	/* If appropriate, consider parallel sequential scan */
  	if (rel->consider_parallel && required_outer == NULL)
  		create_plain_partial_paths(root, rel);
  
  	/* Consider index scans */
! 	create_index_paths(root, rel);
  
  	/* Consider TID scans */
  	create_tidscan_paths(root, rel);
--- 699,726 ----
  	 */
  	required_outer = rel->lateral_relids;
  
! 	/* Consider sequential scan, both plain and grouped. */
! 	seq_path = create_seqscan_path(root, rel, required_outer, 0);
! 	add_path(rel, seq_path, false);
! 	if (rel->gpi != NULL && required_outer == NULL)
! 		create_grouped_path(root, rel, seq_path, false, false, AGG_HASHED);
  
! 	/* If appropriate, consider parallel sequential scan (plain or grouped) */
  	if (rel->consider_parallel && required_outer == NULL)
  		create_plain_partial_paths(root, rel);
  
  	/* Consider index scans */
! 	create_index_paths(root, rel, false);
! 	if (rel->gpi != NULL)
! 	{
! 		/*
! 		 * TODO Instead of calling the whole clause-matching machinery twice
! 		 * (there should be no difference between plain and grouped paths from
! 		 * this point of view), consider returning a separate list of paths
! 		 * usable as grouped ones.
! 		 */
! 		create_index_paths(root, rel, true);
! 	}
  
  	/* Consider TID scans */
  	create_tidscan_paths(root, rel);
*************** static void
*** 716,721 ****
--- 734,740 ----
  create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
  {
  	int			parallel_workers;
+ 	Path		*path;
  
  	parallel_workers = compute_parallel_worker(rel, rel->pages, -1);
  
*************** create_plain_partial_paths(PlannerInfo *
*** 724,730 ****
  		return;
  
  	/* Add an unordered partial path based on a parallel sequential scan. */
! 	add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_workers));
  }
  
  /*
--- 743,850 ----
  		return;
  
  	/* Add an unordered partial path based on a parallel sequential scan. */
! 	path = create_seqscan_path(root, rel, NULL, parallel_workers);
! 	add_partial_path(rel, path, false);
! 
! 	/*
! 	 * Do partial aggregation at base relation level if the relation is
! 	 * eligible for it.
! 	 */
! 	if (rel->gpi != NULL)
! 		create_grouped_path(root, rel, path, false, true, AGG_HASHED);
! }
! 
! /*
!  * Apply partial aggregation to a subpath and add the AggPath to the
!  * appropriate pathlist.
!  *
!  * "precheck" tells whether the aggregation path should first be checked using
!  * add_path_precheck().
!  *
!  * If "partial" is true, the resulting path is considered partial in terms of
!  * parallel execution.
!  *
!  * The path we create here shouldn't be parameterized because of supposedly
!  * high startup cost of aggregation (whether due to build of hash table for
!  * AGG_HASHED strategy or due to explicit sort for AGG_SORTED).
!  *
!  * XXX IndexPath as an input for AGG_SORTED might seem to be an exception, but
!  * aggregation of its output is only beneficial if it's performed by multiple
!  * workers, i.e. the resulting path is partial (Besides parallel aggregation,
!  * the other use case of aggregation push-down is aggregation performed on
!  * remote database, but that has nothing to do with IndexScan). And partial
!  * path cannot be parameterized because it's semantically wrong to use it on
!  * the inner side of NL join.
!  */
! void
! create_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
! 					bool precheck, bool partial, AggStrategy aggstrategy)
! {
! 	List    *group_clauses = NIL;
! 	List	*group_exprs = NIL;
! 	List	*agg_exprs = NIL;
! 	Path	*agg_path;
! 
! 	/*
! 	 * If the AggPath should be partial, the subpath must be too, and
! 	 * therefore the subpath is essentially parallel_safe.
! 	 */
! 	Assert(subpath->parallel_safe || !partial);
! 
! 	/*
! 	 * Grouped path should never be parameterized, so we're not supposed to
! 	 * receive parameterized subpath.
! 	 */
! 	Assert(subpath->param_info == NULL);
! 
! 	/*
! 	 * Note that "partial" in the following function names refers to 2-stage
! 	 * aggregation, not to parallel processing.
! 	 */
! 	if (aggstrategy == AGG_HASHED)
! 		agg_path = (Path *) create_partial_agg_hashed_path(root, subpath,
! 														   true,
! 														   &group_clauses,
! 														   &group_exprs,
! 														   &agg_exprs,
! 														   subpath->rows);
! 	else if (aggstrategy == AGG_SORTED)
! 		agg_path = (Path *) create_partial_agg_sorted_path(root, subpath,
! 														   true,
! 														   &group_clauses,
! 														   &group_exprs,
! 														   &agg_exprs,
! 														   subpath->rows);
! 	else
! 		elog(ERROR, "unexpected strategy %d", aggstrategy);
! 
! 	/* Add the grouped path to the list of grouped base paths. */
! 	if (agg_path != NULL)
! 	{
! 		if (precheck)
! 		{
! 			List	*pathkeys;
! 
! 			/* AGG_HASH is not supposed to generate sorted output. */
! 			pathkeys = aggstrategy == AGG_SORTED ? subpath->pathkeys : NIL;
! 
! 			if (!partial &&
! 				!add_path_precheck(rel, agg_path->startup_cost,
! 								   agg_path->total_cost, pathkeys, NULL,
! 								   true))
! 				return;
! 
! 			if (partial &&
! 				!add_partial_path_precheck(rel, agg_path->total_cost, pathkeys,
! 										   true))
! 				return;
! 		}
! 
! 		if (!partial)
! 			add_path(rel, (Path *) agg_path, true);
! 		else
! 			add_partial_path(rel, (Path *) agg_path, true);
! 	}
  }
  
  /*
*************** set_tablesample_rel_pathlist(PlannerInfo
*** 810,816 ****
  		path = (Path *) create_material_path(rel, path);
  	}
  
! 	add_path(rel, path);
  
  	/* For the moment, at least, there are no other paths to consider */
  }
--- 930,936 ----
  		path = (Path *) create_material_path(rel, path);
  	}
  
! 	add_path(rel, path, false);
  
  	/* For the moment, at least, there are no other paths to consider */
  }
*************** set_append_rel_size(PlannerInfo *root, R
*** 915,926 ****
  		childrel = find_base_rel(root, childRTindex);
  		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
  
  		/*
! 		 * We have to copy the parent's targetlist and quals to the child,
! 		 * with appropriate substitution of variables.  However, only the
! 		 * baserestrictinfo quals are needed before we can check for
! 		 * constraint exclusion; so do that first and then check to see if we
! 		 * can disregard this child.
  		 *
  		 * The child rel's targetlist might contain non-Var expressions, which
  		 * means that substitution into the quals could produce opportunities
--- 1035,1100 ----
  		childrel = find_base_rel(root, childRTindex);
  		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
  
+ 		if (rel->part_scheme)
+ 		{
+ 			AttrNumber		attno;
+ 
+ 			/*
+ 			 * For a partitioned tables, individual partitions can participate
+ 			 * in the pair-wise joins. We need attr_needed data for building
+ 			 * targetlists of joins between partitions.
+ 			 */
+ 			for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ 			{
+ 				int	index = attno - rel->min_attr;
+ 				Relids	attr_needed = bms_copy(rel->attr_needed[index]);
+ 
+ 				/* System attributes do not need translation. */
+ 				if (attno <= 0)
+ 				{
+ 					Assert(rel->min_attr == childrel->min_attr);
+ 					childrel->attr_needed[index] = attr_needed;
+ 				}
+ 				else
+ 				{
+ 					Var *var = list_nth(appinfo->translated_vars,
+ 										attno - 1);
+ 					int child_index;
+ 
+ 					/*
+ 					 * Parent Var for a user defined attribute translates to
+ 					 * child Var.
+ 					 */
+ 					Assert(IsA(var, Var));
+ 
+ 					child_index = var->varattno - childrel->min_attr;
+ 					childrel->attr_needed[child_index] = attr_needed;
+ 				}
+ 			}
+ 		}
+ 
  		/*
! 		 * Copy/Modify targetlist. Even if this child is deemed empty, we need
! 		 * its targetlist in case it falls on nullable side in a child-join
! 		 * because of partition-wise join.
! 		 *
! 		 * NB: the resulting childrel->reltarget->exprs may contain arbitrary
! 		 * expressions, which otherwise would not occur in a rel's targetlist.
! 		 * Code that might be looking at an appendrel child must cope with
! 		 * such.  (Normally, a rel's targetlist would only include Vars and
! 		 * PlaceHolderVars.)  XXX we do not bother to update the cost or width
! 		 * fields of childrel->reltarget; not clear if that would be useful.
! 		 */
! 		childrel->reltarget->exprs = (List *)
! 			adjust_appendrel_attrs(root,
! 								   (Node *) rel->reltarget->exprs,
! 								   1, &appinfo);
! 
! 		/*
! 		 * We have to copy the parent's quals to the child, with appropriate
! 		 * substitution of variables.  However, only the baserestrictinfo quals
! 		 * are needed before we can check for constraint exclusion; so do that
! 		 * first and then check to see if we can disregard this child.
  		 *
  		 * The child rel's targetlist might contain non-Var expressions, which
  		 * means that substitution into the quals could produce opportunities
*************** set_append_rel_size(PlannerInfo *root, R
*** 941,947 ****
  			Assert(IsA(rinfo, RestrictInfo));
  			childqual = adjust_appendrel_attrs(root,
  											   (Node *) rinfo->clause,
! 											   appinfo);
  			childqual = eval_const_expressions(root, childqual);
  			/* check for flat-out constant */
  			if (childqual && IsA(childqual, Const))
--- 1115,1121 ----
  			Assert(IsA(rinfo, RestrictInfo));
  			childqual = adjust_appendrel_attrs(root,
  											   (Node *) rinfo->clause,
! 											   1, &appinfo);
  			childqual = eval_const_expressions(root, childqual);
  			/* check for flat-out constant */
  			if (childqual && IsA(childqual, Const))
*************** set_append_rel_size(PlannerInfo *root, R
*** 1047,1070 ****
  			continue;
  		}
  
! 		/*
! 		 * CE failed, so finish copying/modifying targetlist and join quals.
! 		 *
! 		 * NB: the resulting childrel->reltarget->exprs may contain arbitrary
! 		 * expressions, which otherwise would not occur in a rel's targetlist.
! 		 * Code that might be looking at an appendrel child must cope with
! 		 * such.  (Normally, a rel's targetlist would only include Vars and
! 		 * PlaceHolderVars.)  XXX we do not bother to update the cost or width
! 		 * fields of childrel->reltarget; not clear if that would be useful.
! 		 */
  		childrel->joininfo = (List *)
  			adjust_appendrel_attrs(root,
  								   (Node *) rel->joininfo,
! 								   appinfo);
! 		childrel->reltarget->exprs = (List *)
! 			adjust_appendrel_attrs(root,
! 								   (Node *) rel->reltarget->exprs,
! 								   appinfo);
  
  		/*
  		 * We have to make child entries in the EquivalenceClass data
--- 1221,1231 ----
  			continue;
  		}
  
! 		/* CE failed, so finish copying/modifying join quals. */
  		childrel->joininfo = (List *)
  			adjust_appendrel_attrs(root,
  								   (Node *) rel->joininfo,
! 								   1, &appinfo);
  
  		/*
  		 * We have to make child entries in the EquivalenceClass data
*************** set_append_rel_size(PlannerInfo *root, R
*** 1079,1092 ****
  		childrel->has_eclass_joins = rel->has_eclass_joins;
  
  		/*
- 		 * Note: we could compute appropriate attr_needed data for the child's
- 		 * variables, by transforming the parent's attr_needed through the
- 		 * translated_vars mapping.  However, currently there's no need
- 		 * because attr_needed is only examined for base relations not
- 		 * otherrels.  So we just leave the child's attr_needed empty.
- 		 */
- 
- 		/*
  		 * If parallelism is allowable for this query in general, see whether
  		 * it's allowable for this childrel in particular.  But if we've
  		 * already decided the appendrel is not parallel-safe as a whole,
--- 1240,1245 ----
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1281,1299 ****
  	bool		subpaths_valid = true;
  	List	   *partial_subpaths = NIL;
  	bool		partial_subpaths_valid = true;
  	List	   *all_child_pathkeys = NIL;
  	List	   *all_child_outers = NIL;
  	ListCell   *l;
  	List	   *partitioned_rels = NIL;
  	RangeTblEntry *rte;
  
! 	rte = planner_rt_fetch(rel->relid, root);
! 	if (rte->relkind == RELKIND_PARTITIONED_TABLE)
  	{
! 		partitioned_rels = get_partitioned_child_rels(root, rel->relid);
! 		/* The root partitioned table is included as a child rel */
! 		Assert(list_length(partitioned_rels) >= 1);
  	}
  
  	/*
  	 * For every non-dummy child, remember the cheapest path.  Also, identify
--- 1434,1460 ----
  	bool		subpaths_valid = true;
  	List	   *partial_subpaths = NIL;
  	bool		partial_subpaths_valid = true;
+ 	List	   *grouped_subpaths = NIL;
+ 	bool		grouped_subpaths_valid = true;
  	List	   *all_child_pathkeys = NIL;
  	List	   *all_child_outers = NIL;
  	ListCell   *l;
  	List	   *partitioned_rels = NIL;
  	RangeTblEntry *rte;
  
! 	if (rel->reloptkind == RELOPT_BASEREL)
  	{
! 		rte = planner_rt_fetch(rel->relid, root);
! 
! 		if (rte->relkind == RELKIND_PARTITIONED_TABLE)
! 		{
! 			partitioned_rels = get_partitioned_child_rels(root, rel->relid);
! 			/* The root partitioned table is included as a child rel */
! 			Assert(list_length(partitioned_rels) >= 1);
! 		}
  	}
+ 	else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
+ 		partitioned_rels = get_partitioned_child_rels_for_join(root, rel);
  
  	/*
  	 * For every non-dummy child, remember the cheapest path.  Also, identify
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1324,1329 ****
--- 1485,1521 ----
  			partial_subpaths_valid = false;
  
  		/*
+ 		 * For grouped paths, use only the unparameterized subpaths.
+ 		 *
+ 		 * XXX Consider if the parameterized subpaths should be processed
+ 		 * below. It's probably not useful for sequential scans (due to
+ 		 * repeated aggregation), but might be worthwhile for other child
+ 		 * nodes.
+ 		 */
+ 		if (childrel->gpi != NULL && childrel->gpi->pathlist != NIL)
+ 		{
+ 			Path	*path;
+ 
+ 			path = (Path *) linitial(childrel->gpi->pathlist);
+ 
+ 			/*
+ 			 * PoC only: Simulate remote aggregation, which seems to be the
+ 			 * typical use case for pushing the aggregation below Append node.
+ 			 */
+ 			path->startup_cost = 0.0;
+ 			path->total_cost = 0.0;
+ 
+ 			if (path->param_info == NULL)
+ 				grouped_subpaths = accumulate_append_subpath(grouped_subpaths,
+ 															 path);
+ 			else
+ 				grouped_subpaths_valid = false;
+ 		}
+ 		else
+ 			grouped_subpaths_valid = false;
+ 
+ 
+ 		/*
  		 * Collect lists of all the available path orderings and
  		 * parameterizations for all the children.  We use these as a
  		 * heuristic to indicate which sort orderings and parameterizations we
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1395,1401 ****
  	 */
  	if (subpaths_valid)
  		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
! 												  partitioned_rels));
  
  	/*
  	 * Consider an append of partial unordered, unparameterized partial paths.
--- 1587,1594 ----
  	 */
  	if (subpaths_valid)
  		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
! 					 partitioned_rels),
! 				 false);
  
  	/*
  	 * Consider an append of partial unordered, unparameterized partial paths.
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1422,1429 ****
  
  		/* Generate a partial append path. */
  		appendpath = create_append_path(rel, partial_subpaths, NULL,
! 										parallel_workers, partitioned_rels);
! 		add_partial_path(rel, (Path *) appendpath);
  	}
  
  	/*
--- 1615,1635 ----
  
  		/* Generate a partial append path. */
  		appendpath = create_append_path(rel, partial_subpaths, NULL,
! 										parallel_workers,
! 										partitioned_rels);
! 		add_partial_path(rel, (Path *) appendpath, false);
! 	}
! 
! 	/* TODO Also partial grouped paths? */
! 	if (grouped_subpaths_valid)
! 	{
! 		Path	*path;
! 
! 		path = (Path *) create_append_path(rel, grouped_subpaths, NULL, 0,
! 			partitioned_rels);
! 		/* pathtarget will produce the grouped relation.. */
! 		path->pathtarget = rel->gpi->target;
! 		add_path(rel, path, true);
  	}
  
  	/*
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1476,1482 ****
  		if (subpaths_valid)
  			add_path(rel, (Path *)
  					 create_append_path(rel, subpaths, required_outer, 0,
! 										partitioned_rels));
  	}
  }
  
--- 1682,1689 ----
  		if (subpaths_valid)
  			add_path(rel, (Path *)
  					 create_append_path(rel, subpaths, required_outer, 0,
! 						 partitioned_rels),
! 					 false);
  	}
  }
  
*************** generate_mergeappend_paths(PlannerInfo *
*** 1572,1585 ****
  														startup_subpaths,
  														pathkeys,
  														NULL,
! 														partitioned_rels));
  		if (startup_neq_total)
  			add_path(rel, (Path *) create_merge_append_path(root,
  															rel,
  															total_subpaths,
  															pathkeys,
  															NULL,
! 															partitioned_rels));
  	}
  }
  
--- 1779,1794 ----
  														startup_subpaths,
  														pathkeys,
  														NULL,
! 														partitioned_rels),
! 				 false);
  		if (startup_neq_total)
  			add_path(rel, (Path *) create_merge_append_path(root,
  															rel,
  															total_subpaths,
  															pathkeys,
  															NULL,
! 															partitioned_rels),
! 					 false);
  	}
  }
  
*************** set_dummy_rel_pathlist(RelOptInfo *rel)
*** 1712,1718 ****
  	rel->pathlist = NIL;
  	rel->partial_pathlist = NIL;
  
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
  
  	/*
  	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
--- 1921,1927 ----
  	rel->pathlist = NIL;
  	rel->partial_pathlist = NIL;
  
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL), false);
  
  	/*
  	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
*************** set_subquery_pathlist(PlannerInfo *root,
*** 1926,1932 ****
  		/* Generate outer path using this subpath */
  		add_path(rel, (Path *)
  				 create_subqueryscan_path(root, rel, subpath,
! 										  pathkeys, required_outer));
  	}
  }
  
--- 2135,2141 ----
  		/* Generate outer path using this subpath */
  		add_path(rel, (Path *)
  				 create_subqueryscan_path(root, rel, subpath,
! 										  pathkeys, required_outer), false);
  	}
  }
  
*************** set_function_pathlist(PlannerInfo *root,
*** 1995,2001 ****
  
  	/* Generate appropriate path */
  	add_path(rel, create_functionscan_path(root, rel,
! 										   pathkeys, required_outer));
  }
  
  /*
--- 2204,2210 ----
  
  	/* Generate appropriate path */
  	add_path(rel, create_functionscan_path(root, rel,
! 										   pathkeys, required_outer), false);
  }
  
  /*
*************** set_values_pathlist(PlannerInfo *root, R
*** 2015,2021 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
  }
  
  /*
--- 2224,2230 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_valuesscan_path(root, rel, required_outer), false);
  }
  
  /*
*************** set_tablefunc_pathlist(PlannerInfo *root
*** 2036,2042 ****
  
  	/* Generate appropriate path */
  	add_path(rel, create_tablefuncscan_path(root, rel,
! 											required_outer));
  }
  
  /*
--- 2245,2251 ----
  
  	/* Generate appropriate path */
  	add_path(rel, create_tablefuncscan_path(root, rel,
! 											required_outer), false);
  }
  
  /*
*************** set_cte_pathlist(PlannerInfo *root, RelO
*** 2102,2108 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_ctescan_path(root, rel, required_outer));
  }
  
  /*
--- 2311,2317 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_ctescan_path(root, rel, required_outer), false);
  }
  
  /*
*************** set_namedtuplestore_pathlist(PlannerInfo
*** 2129,2135 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_namedtuplestorescan_path(root, rel, required_outer));
  
  	/* Select cheapest path (pretty easy in this case...) */
  	set_cheapest(rel);
--- 2338,2345 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_namedtuplestorescan_path(root, rel, required_outer),
! 			 false);
  
  	/* Select cheapest path (pretty easy in this case...) */
  	set_cheapest(rel);
*************** set_worktable_pathlist(PlannerInfo *root
*** 2182,2188 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
  }
  
  /*
--- 2392,2399 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_worktablescan_path(root, rel, required_outer),
! 			 false);
  }
  
  /*
*************** set_worktable_pathlist(PlannerInfo *root
*** 2195,2208 ****
   * path that some GatherPath or GatherMergePath has a reference to.)
   */
  void
! generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
  {
  	Path	   *cheapest_partial_path;
  	Path	   *simple_gather_path;
  	ListCell   *lc;
  
  	/* If there are no partial paths, there's nothing to do here. */
! 	if (rel->partial_pathlist == NIL)
  		return;
  
  	/*
--- 2406,2426 ----
   * path that some GatherPath or GatherMergePath has a reference to.)
   */
  void
! generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool grouped)
  {
  	Path	   *cheapest_partial_path;
  	Path	   *simple_gather_path;
+ 	List	   *pathlist = NIL;
+ 	PathTarget *partial_target;
  	ListCell   *lc;
  
+ 	if (!grouped)
+ 		pathlist = rel->partial_pathlist;
+ 	else if (rel->gpi != NULL)
+ 		pathlist = rel->gpi->partial_pathlist;
+ 
  	/* If there are no partial paths, there's nothing to do here. */
! 	if (pathlist == NIL)
  		return;
  
  	/*
*************** generate_gather_paths(PlannerInfo *root,
*** 2210,2226 ****
  	 * path of interest: the cheapest one.  That will be the one at the front
  	 * of partial_pathlist because of the way add_partial_path works.
  	 */
! 	cheapest_partial_path = linitial(rel->partial_pathlist);
  	simple_gather_path = (Path *)
! 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
  						   NULL, NULL);
! 	add_path(rel, simple_gather_path);
  
  	/*
  	 * For each useful ordering, we can consider an order-preserving Gather
  	 * Merge.
  	 */
! 	foreach (lc, rel->partial_pathlist)
  	{
  		Path   *subpath = (Path *) lfirst(lc);
  		GatherMergePath   *path;
--- 2428,2450 ----
  	 * path of interest: the cheapest one.  That will be the one at the front
  	 * of partial_pathlist because of the way add_partial_path works.
  	 */
! 	cheapest_partial_path = linitial(pathlist);
! 
! 	if (!grouped)
! 		partial_target = rel->reltarget;
! 	else if (rel->gpi != NULL)
! 		partial_target = rel->gpi->target;
! 
  	simple_gather_path = (Path *)
! 		create_gather_path(root, rel, cheapest_partial_path, partial_target,
  						   NULL, NULL);
! 	add_path(rel, simple_gather_path, grouped);
  
  	/*
  	 * For each useful ordering, we can consider an order-preserving Gather
  	 * Merge.
  	 */
! 	foreach (lc, pathlist)
  	{
  		Path   *subpath = (Path *) lfirst(lc);
  		GatherMergePath   *path;
*************** generate_gather_paths(PlannerInfo *root,
*** 2228,2236 ****
  		if (subpath->pathkeys == NIL)
  			continue;
  
! 		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
  										subpath->pathkeys, NULL, NULL);
! 		add_path(rel, &path->path);
  	}
  }
  
--- 2452,2460 ----
  		if (subpath->pathkeys == NIL)
  			continue;
  
! 		path = create_gather_merge_path(root, rel, subpath, partial_target,
  										subpath->pathkeys, NULL, NULL);
! 		add_path(rel, &path->path, grouped);
  	}
  }
  
*************** standard_join_search(PlannerInfo *root,
*** 2388,2402 ****
  		 * Run generate_gather_paths() for each just-processed joinrel.  We
  		 * could not do this earlier because both regular and partial paths
  		 * can get added to a particular joinrel at multiple times within
! 		 * join_search_one_level.  After that, we're done creating paths for
! 		 * the joinrel, so run set_cheapest().
  		 */
  		foreach(lc, root->join_rel_level[lev])
  		{
  			rel = (RelOptInfo *) lfirst(lc);
  
  			/* Create GatherPaths for any useful partial paths for rel */
! 			generate_gather_paths(root, rel);
  
  			/* Find and save the cheapest paths for this rel */
  			set_cheapest(rel);
--- 2612,2641 ----
  		 * Run generate_gather_paths() for each just-processed joinrel.  We
  		 * could not do this earlier because both regular and partial paths
  		 * can get added to a particular joinrel at multiple times within
! 		 * join_search_one_level.
! 		 *
! 		 * Similarly, create paths for joinrels which used partition-wise join
! 		 * technique. We could not do this earlier because paths can get added
! 		 * to a particular child-join at multiple times within
! 		 * join_search_one_level.
! 		 *
! 		 * After that, we're done creating paths for the joinrel, so run
! 		 * set_cheapest().
  		 */
  		foreach(lc, root->join_rel_level[lev])
  		{
  			rel = (RelOptInfo *) lfirst(lc);
  
+ 			/*
+ 			 * Create paths for partition-wise joins. Do this before creating
+ 			 * GatherPaths so that partial "append" paths in partitioned joins
+ 			 * will be considered.
+ 			 */
+ 			generate_partition_wise_join_paths(root, rel);
+ 
  			/* Create GatherPaths for any useful partial paths for rel */
! 			generate_gather_paths(root, rel, false);
! 			generate_gather_paths(root, rel, true);
  
  			/* Find and save the cheapest paths for this rel */
  			set_cheapest(rel);
*************** create_partial_bitmap_paths(PlannerInfo
*** 3047,3053 ****
  		return;
  
  	add_partial_path(rel, (Path *) create_bitmap_heap_path(root, rel,
! 					bitmapqual, rel->lateral_relids, 1.0, parallel_workers));
  }
  
  /*
--- 3286,3292 ----
  		return;
  
  	add_partial_path(rel, (Path *) create_bitmap_heap_path(root, rel,
! 				   bitmapqual, rel->lateral_relids, 1.0, parallel_workers), false);
  }
  
  /*
*************** compute_parallel_worker(RelOptInfo *rel,
*** 3142,3147 ****
--- 3381,3454 ----
  	return parallel_workers;
  }
  
+ /*
+  * generate_partition_wise_join_paths
+  *
+  * 		Create paths representing partition-wise join for given partitioned
+  * 		join relation.
+  *
+  * This must not be called until after we are done adding paths for all
+  * child-joins. (Otherwise, add_path might delete a path that some "append"
+  * path has reference to.
+  */
+ void
+ generate_partition_wise_join_paths(PlannerInfo *root, RelOptInfo *rel)
+ {
+ 	List   *live_children = NIL;
+ 	int		cnt_parts;
+ 	int		num_parts;
+ 	RelOptInfo	   **part_rels;
+ 
+ 	/* Handle only join relations. */
+ 	if (!IS_JOIN_REL(rel))
+ 		return;
+ 
+ 	/* If the relation is not partitioned or is proven dummy, nothing to do. */
+ 	if (!rel->part_scheme || !rel->boundinfo || IS_DUMMY_REL(rel))
+ 		return;
+ 
+ 	/* A partitioned join should have RelOptInfos of the child-joins. */
+ 	Assert(rel->part_rels && rel->nparts > 0);
+ 
+ 	/* Guard against stack overflow due to overly deep partition hierarchy. */
+ 	check_stack_depth();
+ 
+ 	num_parts = rel->nparts;
+ 	part_rels = rel->part_rels;
+ 
+    /* Collect non-dummy child-joins. */
+ 	for (cnt_parts = 0; cnt_parts < num_parts; cnt_parts++)
+ 	{
+ 		RelOptInfo *child_rel = part_rels[cnt_parts];
+ 
+ 		/* Add partition-wise join paths for partitioned child-joins. */
+ 		generate_partition_wise_join_paths(root, child_rel);
+ 
+ 		/* Dummy children will not be scanned, so ingore those. */
+ 		if (IS_DUMMY_REL(child_rel))
+ 			continue;
+ 
+ 		set_cheapest(child_rel);
+ 
+ #ifdef OPTIMIZER_DEBUG
+ 		debug_print_rel(root, rel);
+ #endif
+ 
+ 		live_children = lappend(live_children, child_rel);
+ 	}
+ 
+ 	/* If all child-joins are dummy, parent join is also dummy. */
+ 	if (!live_children)
+ 	{
+ 		mark_dummy_rel(rel);
+ 		return;
+ 	}
+ 
+ 	/* Add "append" paths containing paths from child-joins. */
+ 	add_paths_to_append_rel(root, rel, live_children);
+ 	list_free(live_children);
+ }
+ 
  
  /*****************************************************************************
   *			DEBUG SUPPORT
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
new file mode 100644
index 52643d0..f278b77
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** bool		enable_material = true;
*** 127,132 ****
--- 127,133 ----
  bool		enable_mergejoin = true;
  bool		enable_hashjoin = true;
  bool		enable_gathermerge = true;
+ bool		enable_partition_wise_join = false;
  
  typedef struct
  {
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
new file mode 100644
index 67bd760..780ea04
*** a/src/backend/optimizer/path/equivclass.c
--- b/src/backend/optimizer/path/equivclass.c
*************** generate_join_implied_equalities_broken(
*** 1329,1335 ****
  	if (IS_OTHER_REL(inner_rel) && result != NIL)
  		result = (List *) adjust_appendrel_attrs_multilevel(root,
  															(Node *) result,
! 															inner_rel);
  
  	return result;
  }
--- 1329,1336 ----
  	if (IS_OTHER_REL(inner_rel) && result != NIL)
  		result = (List *) adjust_appendrel_attrs_multilevel(root,
  															(Node *) result,
! 															inner_rel->relids,
! 												 inner_rel->top_parent_relids);
  
  	return result;
  }
*************** add_child_rel_equivalences(PlannerInfo *
*** 2112,2118 ****
  				child_expr = (Expr *)
  					adjust_appendrel_attrs(root,
  										   (Node *) cur_em->em_expr,
! 										   appinfo);
  
  				/*
  				 * Transform em_relids to match.  Note we do *not* do
--- 2113,2119 ----
  				child_expr = (Expr *)
  					adjust_appendrel_attrs(root,
  										   (Node *) cur_em->em_expr,
! 										   1, &appinfo);
  
  				/*
  				 * Transform em_relids to match.  Note we do *not* do
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
new file mode 100644
index 6e4bae8..a6fa713
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
***************
*** 32,37 ****
--- 32,38 ----
  #include "optimizer/predtest.h"
  #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
+ #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "utils/builtins.h"
  #include "utils/bytea.h"
*************** static bool eclass_already_used(Equivale
*** 107,119 ****
  static bool bms_equal_any(Relids relids, List *relids_list);
  static void get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths);
  static List *build_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				  IndexOptInfo *index, IndexClauseSet *clauses,
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				  bool *skip_lower_saop);
  static List *build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
  				   List *clauses, List *other_clauses);
  static List *generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel,
--- 108,121 ----
  static bool bms_equal_any(Relids relids, List *relids_list);
  static void get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths, bool grouped);
  static List *build_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				  IndexOptInfo *index, IndexClauseSet *clauses,
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				   bool *skip_lower_saop,
! 				   bool grouped);
  static List *build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
  				   List *clauses, List *other_clauses);
  static List *generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel,
*************** static Const *string_to_const(const char
*** 229,235 ****
   * as meaning "unparameterized so far as the indexquals are concerned".
   */
  void
! create_index_paths(PlannerInfo *root, RelOptInfo *rel)
  {
  	List	   *indexpaths;
  	List	   *bitindexpaths;
--- 231,237 ----
   * as meaning "unparameterized so far as the indexquals are concerned".
   */
  void
! create_index_paths(PlannerInfo *root, RelOptInfo *rel, bool grouped)
  {
  	List	   *indexpaths;
  	List	   *bitindexpaths;
*************** create_index_paths(PlannerInfo *root, Re
*** 274,281 ****
  		 * non-parameterized paths.  Plain paths go directly to add_path(),
  		 * bitmap paths are added to bitindexpaths to be handled below.
  		 */
! 		get_index_paths(root, rel, index, &rclauseset,
! 						&bitindexpaths);
  
  		/*
  		 * Identify the join clauses that can match the index.  For the moment
--- 276,283 ----
  		 * non-parameterized paths.  Plain paths go directly to add_path(),
  		 * bitmap paths are added to bitindexpaths to be handled below.
  		 */
! 		get_index_paths(root, rel, index, &rclauseset, &bitindexpaths,
! 						grouped);
  
  		/*
  		 * Identify the join clauses that can match the index.  For the moment
*************** create_index_paths(PlannerInfo *root, Re
*** 338,344 ****
  		bitmapqual = choose_bitmap_and(root, rel, bitindexpaths);
  		bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  										rel->lateral_relids, 1.0, 0);
! 		add_path(rel, (Path *) bpath);
  
  		/* create a partial bitmap heap path */
  		if (rel->consider_parallel && rel->lateral_relids == NULL)
--- 340,346 ----
  		bitmapqual = choose_bitmap_and(root, rel, bitindexpaths);
  		bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  										rel->lateral_relids, 1.0, 0);
! 		add_path(rel, (Path *) bpath, false);
  
  		/* create a partial bitmap heap path */
  		if (rel->consider_parallel && rel->lateral_relids == NULL)
*************** create_index_paths(PlannerInfo *root, Re
*** 415,421 ****
  			loop_count = get_loop_count(root, rel->relid, required_outer);
  			bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  											required_outer, loop_count, 0);
! 			add_path(rel, (Path *) bpath);
  		}
  	}
  }
--- 417,423 ----
  			loop_count = get_loop_count(root, rel->relid, required_outer);
  			bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  											required_outer, loop_count, 0);
! 			add_path(rel, (Path *) bpath, false);
  		}
  	}
  }
*************** get_join_index_paths(PlannerInfo *root,
*** 667,673 ****
  	Assert(clauseset.nonempty);
  
  	/* Build index path(s) using the collected set of clauses */
! 	get_index_paths(root, rel, index, &clauseset, bitindexpaths);
  
  	/*
  	 * Remember we considered paths for this set of relids.  We use lcons not
--- 669,675 ----
  	Assert(clauseset.nonempty);
  
  	/* Build index path(s) using the collected set of clauses */
! 	get_index_paths(root, rel, index, &clauseset, bitindexpaths, false);
  
  	/*
  	 * Remember we considered paths for this set of relids.  We use lcons not
*************** bms_equal_any(Relids relids, List *relid
*** 736,742 ****
  static void
  get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths)
  {
  	List	   *indexpaths;
  	bool		skip_nonnative_saop = false;
--- 738,744 ----
  static void
  get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths, bool grouped)
  {
  	List	   *indexpaths;
  	bool		skip_nonnative_saop = false;
*************** get_index_paths(PlannerInfo *root, RelOp
*** 754,760 ****
  								   index->predOK,
  								   ST_ANYSCAN,
  								   &skip_nonnative_saop,
! 								   &skip_lower_saop);
  
  	/*
  	 * If we skipped any lower-order ScalarArrayOpExprs on an index with an AM
--- 756,762 ----
  								   index->predOK,
  								   ST_ANYSCAN,
  								   &skip_nonnative_saop,
! 								   &skip_lower_saop, grouped);
  
  	/*
  	 * If we skipped any lower-order ScalarArrayOpExprs on an index with an AM
*************** get_index_paths(PlannerInfo *root, RelOp
*** 769,775 ****
  												   index->predOK,
  												   ST_ANYSCAN,
  												   &skip_nonnative_saop,
! 												   NULL));
  	}
  
  	/*
--- 771,777 ----
  												   index->predOK,
  												   ST_ANYSCAN,
  												   &skip_nonnative_saop,
! 												   NULL, grouped));
  	}
  
  	/*
*************** get_index_paths(PlannerInfo *root, RelOp
*** 789,797 ****
  		IndexPath  *ipath = (IndexPath *) lfirst(lc);
  
  		if (index->amhasgettuple)
! 			add_path(rel, (Path *) ipath);
  
! 		if (index->amhasgetbitmap &&
  			(ipath->path.pathkeys == NIL ||
  			 ipath->indexselectivity < 1.0))
  			*bitindexpaths = lappend(*bitindexpaths, ipath);
--- 791,799 ----
  		IndexPath  *ipath = (IndexPath *) lfirst(lc);
  
  		if (index->amhasgettuple)
! 			add_path(rel, (Path *) ipath, grouped);
  
! 		if (!grouped && index->amhasgetbitmap &&
  			(ipath->path.pathkeys == NIL ||
  			 ipath->indexselectivity < 1.0))
  			*bitindexpaths = lappend(*bitindexpaths, ipath);
*************** get_index_paths(PlannerInfo *root, RelOp
*** 802,815 ****
  	 * natively, generate bitmap scan paths relying on executor-managed
  	 * ScalarArrayOpExpr.
  	 */
! 	if (skip_nonnative_saop)
  	{
  		indexpaths = build_index_paths(root, rel,
  									   index, clauses,
  									   false,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL);
  		*bitindexpaths = list_concat(*bitindexpaths, indexpaths);
  	}
  }
--- 804,818 ----
  	 * natively, generate bitmap scan paths relying on executor-managed
  	 * ScalarArrayOpExpr.
  	 */
! 	if (!grouped && skip_nonnative_saop)
  	{
  		indexpaths = build_index_paths(root, rel,
  									   index, clauses,
  									   false,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL,
! 									   false);
  		*bitindexpaths = list_concat(*bitindexpaths, indexpaths);
  	}
  }
*************** build_index_paths(PlannerInfo *root, Rel
*** 861,867 ****
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				  bool *skip_lower_saop)
  {
  	List	   *result = NIL;
  	IndexPath  *ipath;
--- 864,870 ----
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				  bool *skip_lower_saop, bool grouped)
  {
  	List	   *result = NIL;
  	IndexPath  *ipath;
*************** build_index_paths(PlannerInfo *root, Rel
*** 878,883 ****
--- 881,890 ----
  	bool		index_is_ordered;
  	bool		index_only_scan;
  	int			indexcol;
+ 	bool		can_agg_sorted;
+ 	List		*group_clauses, *group_exprs, *agg_exprs;
+ 	AggPath		*agg_path;
+ 	double		agg_input_rows;
  
  	/*
  	 * Check that index supports the desired scan type(s)
*************** build_index_paths(PlannerInfo *root, Rel
*** 891,896 ****
--- 898,906 ----
  		case ST_BITMAPSCAN:
  			if (!index->amhasgetbitmap)
  				return NIL;
+ 
+ 			if (grouped)
+ 				return NIL;
  			break;
  		case ST_ANYSCAN:
  			/* either or both are OK */
*************** build_index_paths(PlannerInfo *root, Rel
*** 1032,1037 ****
--- 1042,1051 ----
  	 * later merging or final output ordering, OR the index has a useful
  	 * predicate, OR an index-only scan is possible.
  	 */
+ 	can_agg_sorted = true;
+ 	group_clauses = NIL;
+ 	group_exprs = NIL;
+ 	agg_exprs = NIL;
  	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
  		index_only_scan)
  	{
*************** build_index_paths(PlannerInfo *root, Rel
*** 1048,1054 ****
  								  outer_relids,
  								  loop_count,
  								  false);
! 		result = lappend(result, ipath);
  
  		/*
  		 * If appropriate, consider parallel index scan.  We don't allow
--- 1062,1086 ----
  								  outer_relids,
  								  loop_count,
  								  false);
! 		if (!grouped)
! 			result = lappend(result, ipath);
! 		else
! 		{
! 			/* TODO Double-check if this is the correct input value. */
! 			agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 			agg_path = create_partial_agg_sorted_path(root, (Path *) ipath,
! 													  true,
! 													  &group_clauses,
! 													  &group_exprs,
! 													  &agg_exprs,
! 													  agg_input_rows);
! 
! 			if (agg_path != NULL)
! 				result = lappend(result, agg_path);
! 			else
! 				can_agg_sorted = false;
! 		}
  
  		/*
  		 * If appropriate, consider parallel index scan.  We don't allow
*************** build_index_paths(PlannerInfo *root, Rel
*** 1077,1083 ****
  			 * using parallel workers, just free it.
  			 */
  			if (ipath->path.parallel_workers > 0)
! 				add_partial_path(rel, (Path *) ipath);
  			else
  				pfree(ipath);
  		}
--- 1109,1139 ----
  			 * using parallel workers, just free it.
  			 */
  			if (ipath->path.parallel_workers > 0)
! 			{
! 				if (!grouped)
! 					add_partial_path(rel, (Path *) ipath, grouped);
! 				else if (can_agg_sorted && outer_relids == NULL)
! 				{
! 					/* TODO Double-check if this is the correct input value. */
! 					agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 					agg_path = create_partial_agg_sorted_path(root,
! 															  (Path *) ipath,
! 															  false,
! 															  &group_clauses,
! 															  &group_exprs,
! 															  &agg_exprs,
! 															  agg_input_rows);
! 
! 					/*
! 					 * If create_agg_sorted_path succeeded once, it should
! 					 * always do.
! 					 */
! 					Assert(agg_path != NULL);
! 
! 					add_partial_path(rel, (Path *) agg_path, grouped);
! 				}
! 			}
  			else
  				pfree(ipath);
  		}
*************** build_index_paths(PlannerInfo *root, Rel
*** 1105,1111 ****
  									  outer_relids,
  									  loop_count,
  									  false);
! 			result = lappend(result, ipath);
  
  			/* If appropriate, consider parallel index scan */
  			if (index->amcanparallel &&
--- 1161,1185 ----
  									  outer_relids,
  									  loop_count,
  									  false);
! 
! 			if (!grouped)
! 				result = lappend(result, ipath);
! 			else if (can_agg_sorted)
! 			{
! 				/* TODO Double-check if this is the correct input value. */
! 				agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 				agg_path = create_partial_agg_sorted_path(root,
! 														  (Path *) ipath,
! 														  true,
! 														  &group_clauses,
! 														  &group_exprs,
! 														  &agg_exprs,
! 														  agg_input_rows);
! 
! 				Assert(agg_path != NULL);
! 				result = lappend(result, agg_path);
! 			}
  
  			/* If appropriate, consider parallel index scan */
  			if (index->amcanparallel &&
*************** build_index_paths(PlannerInfo *root, Rel
*** 1129,1135 ****
  				 * using parallel workers, just free it.
  				 */
  				if (ipath->path.parallel_workers > 0)
! 					add_partial_path(rel, (Path *) ipath);
  				else
  					pfree(ipath);
  			}
--- 1203,1227 ----
  				 * using parallel workers, just free it.
  				 */
  				if (ipath->path.parallel_workers > 0)
! 				{
! 					if (!grouped)
! 						add_partial_path(rel, (Path *) ipath, grouped);
! 					else if (can_agg_sorted && outer_relids == NULL)
! 					{
! 						/* TODO Double-check if this is the correct input value. */
! 						agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 						agg_path = create_partial_agg_sorted_path(root,
! 																  (Path *) ipath,
! 																  false,
! 																  &group_clauses,
! 																  &group_exprs,
! 																  &agg_exprs,
! 																  agg_input_rows);
! 						Assert(agg_path != NULL);
! 						add_partial_path(rel, (Path *) agg_path, grouped);
! 					}
! 				}
  				else
  					pfree(ipath);
  			}
*************** build_paths_for_OR(PlannerInfo *root, Re
*** 1244,1250 ****
  									   useful_predicate,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL);
  		result = list_concat(result, indexpaths);
  	}
  
--- 1336,1343 ----
  									   useful_predicate,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL,
! 									   false);
  		result = list_concat(result, indexpaths);
  	}
  
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
new file mode 100644
index 5aedcd1..f25719f
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 22,34 ****
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
  
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
! #define PATH_PARAM_BY_REL(path, rel)  \
  	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
  
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
--- 22,45 ----
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
+ #include "optimizer/tlist.h"
  
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
! /*
!  * Paths parameterized by the parent can be considered to be parameterized by
!  * any of its child.
!  */
! #define PATH_PARAM_BY_PARENT(path, rel)	\
! 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path),	\
! 									   (rel)->top_parent_relids))
! #define PATH_PARAM_BY_REL_SELF(path, rel)  \
  	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
  
+ #define PATH_PARAM_BY_REL(path, rel)	\
+ 	(PATH_PARAM_BY_REL_SELF(path, rel) || PATH_PARAM_BY_PARENT(path, rel))
+ 
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
*************** static void try_partial_mergejoin_path(P
*** 38,66 ****
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra);
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
! 					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra);
  static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra);
  static void consider_parallel_nestloop(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra);
  static void consider_parallel_mergejoin(PlannerInfo *root,
  							RelOptInfo *joinrel,
  							RelOptInfo *outerrel,
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total);
  static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra);
  static List *select_mergejoin_clauses(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *outerrel,
--- 49,97 ----
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped,
! 						   bool do_aggregate);
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
! 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 								 JoinType jointype, JoinPathExtraData *extra,
! 								 bool grouped);
! static void sort_inner_and_outer_common(PlannerInfo *root,
! 										RelOptInfo *joinrel,
! 										RelOptInfo *outerrel,
! 										RelOptInfo *innerrel,
! 										JoinType jointype,
! 										JoinPathExtraData *extra,
! 										bool grouped_outer,
! 										bool grouped_inner,
! 										bool do_aggregate);
  static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra,
! 					 bool grouped);
  static void consider_parallel_nestloop(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped, bool do_aggregate);
  static void consider_parallel_mergejoin(PlannerInfo *root,
  							RelOptInfo *joinrel,
  							RelOptInfo *outerrel,
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total,
! 							bool grouped);
  static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra,
! 					 bool grouped);
! static bool is_grouped_join_target_complete(PlannerInfo *root,
! 											PathTarget *jointarget,
! 											Path *outer_path,
! 											Path *inner_path);
  static List *select_mergejoin_clauses(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *outerrel,
*************** static void generate_mergejoin_paths(Pla
*** 77,83 ****
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial);
  
  
  /*
--- 108,117 ----
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial,
!  						 bool grouped_outer,
! 						 bool grouped_inner,
! 						 bool do_aggregate);
  
  
  /*
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 115,120 ****
--- 149,167 ----
  	JoinPathExtraData extra;
  	bool		mergejoin_allowed = true;
  	ListCell   *lc;
+ 	Relids		joinrelids;
+ 
+ 	/*
+ 	 * PlannerInfo doesn't contain the SpecialJoinInfos created for joins
+ 	 * between child relations, even if there is a SpecialJoinInfo node for
+ 	 * the join between the topmost parents. Hence while calculating Relids
+ 	 * set representing the restriction, consider relids of topmost parent
+ 	 * of partitions.
+ 	 */
+ 	if (joinrel->reloptkind == RELOPT_OTHER_JOINREL)
+ 		joinrelids = joinrel->top_parent_relids;
+ 	else
+ 		joinrelids = joinrel->relids;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 197,212 ****
  		 * join has already been proven legal.)  If the SJ is relevant, it
  		 * presents constraints for joining to anything not in its RHS.
  		 */
! 		if (bms_overlap(joinrel->relids, sjinfo2->min_righthand) &&
! 			!bms_overlap(joinrel->relids, sjinfo2->min_lefthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													sjinfo2->min_righthand));
  
  		/* full joins constrain both sides symmetrically */
  		if (sjinfo2->jointype == JOIN_FULL &&
! 			bms_overlap(joinrel->relids, sjinfo2->min_lefthand) &&
! 			!bms_overlap(joinrel->relids, sjinfo2->min_righthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													 sjinfo2->min_lefthand));
--- 244,259 ----
  		 * join has already been proven legal.)  If the SJ is relevant, it
  		 * presents constraints for joining to anything not in its RHS.
  		 */
! 		if (bms_overlap(joinrelids, sjinfo2->min_righthand) &&
! 			!bms_overlap(joinrelids, sjinfo2->min_lefthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													sjinfo2->min_righthand));
  
  		/* full joins constrain both sides symmetrically */
  		if (sjinfo2->jointype == JOIN_FULL &&
! 			bms_overlap(joinrelids, sjinfo2->min_lefthand) &&
! 			!bms_overlap(joinrelids, sjinfo2->min_righthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													 sjinfo2->min_lefthand));
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 227,234 ****
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
  	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra);
  
  	/*
  	 * 2. Consider paths where the outer relation need not be explicitly
--- 274,285 ----
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
  	if (mergejoin_allowed)
+ 	{
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, false);
! 		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, true);
! 	}
  
  	/*
  	 * 2. Consider paths where the outer relation need not be explicitly
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 238,245 ****
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
  	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra);
  
  #ifdef NOT_USED
  
--- 289,300 ----
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
  	if (mergejoin_allowed)
+ 	{
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, false);
! 		match_unsorted_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, true);
! 	}
  
  #ifdef NOT_USED
  
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 265,272 ****
  	 * joins, because there may be no other alternative.
  	 */
  	if (enable_hashjoin || jointype == JOIN_FULL)
  		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra);
  
  	/*
  	 * 5. If inner and outer relations are foreign tables (or joins) belonging
--- 320,331 ----
  	 * joins, because there may be no other alternative.
  	 */
  	if (enable_hashjoin || jointype == JOIN_FULL)
+ 	{
  		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, false);
! 		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, true);
! 	}
  
  	/*
  	 * 5. If inner and outer relations are foreign tables (or joins) belonging
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 304,321 ****
   */
  static inline bool
  allow_star_schema_join(PlannerInfo *root,
! 					   Path *outer_path,
! 					   Path *inner_path)
  {
- 	Relids		innerparams = PATH_REQ_OUTER(inner_path);
- 	Relids		outerrelids = outer_path->parent->relids;
- 
  	/*
  	 * It's a star-schema case if the outer rel provides some but not all of
  	 * the inner rel's parameterization.
  	 */
! 	return (bms_overlap(innerparams, outerrelids) &&
! 			bms_nonempty_difference(innerparams, outerrelids));
  }
  
  /*
--- 363,377 ----
   */
  static inline bool
  allow_star_schema_join(PlannerInfo *root,
! 					   Relids outerrelids,
! 					   Relids inner_paramrels)
  {
  	/*
  	 * It's a star-schema case if the outer rel provides some but not all of
  	 * the inner rel's parameterization.
  	 */
! 	return (bms_overlap(inner_paramrels, outerrelids) &&
! 			bms_nonempty_difference(inner_paramrels, outerrelids));
  }
  
  /*
*************** try_nestloop_path(PlannerInfo *root,
*** 330,339 ****
  				  Path *inner_path,
  				  List *pathkeys,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
  
  	/*
  	 * Check to see if proposed path is still parameterized, and reject if the
--- 386,427 ----
  				  Path *inner_path,
  				  List *pathkeys,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra,
! 				  bool grouped,
! 				  bool do_aggregate)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
+ 	RelOptInfo *innerrel = inner_path->parent;
+ 	RelOptInfo *outerrel = outer_path->parent;
+ 	Relids		innerrelids;
+ 	Relids		outerrelids;
+ 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
+ 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+  	Path		*join_path;
+  	PathTarget	*join_target;
+ 
+  	/* Caller should not request aggregation w/o grouped output. */
+ 	Assert(!do_aggregate || grouped);
+ 
+ 	/* GroupedPathInfo is necessary for us to produce a grouped set. */
+ 	Assert(joinrel->gpi != NULL || !grouped);
+ 
+ 	/*
+ 	 * Parameterized paths in the child relations (base or join) are
+ 	 * parameterized by top-level parent. Any paths we will create to be
+ 	 * parameterized by the child child relations, are not added to the
+ 	 * pathlist. Hence run parameterization tests on the parent relids.
+ 	 */
+ 	if (innerrel->top_parent_relids)
+ 		innerrelids = innerrel->top_parent_relids;
+ 	else
+ 		innerrelids = innerrel->relids;
+ 
+ 	if (outerrel->top_parent_relids)
+ 		outerrelids = outerrel->top_parent_relids;
+ 	else
+ 		outerrelids = outerrel->relids;
  
  	/*
  	 * Check to see if proposed path is still parameterized, and reject if the
*************** try_nestloop_path(PlannerInfo *root,
*** 341,359 ****
  	 * says to allow it anyway.  Also, we must reject if have_dangerous_phv
  	 * doesn't like the look of it, which could only happen if the nestloop is
  	 * still parameterized.
  	 */
! 	required_outer = calc_nestloop_required_outer(outer_path,
! 												  inner_path);
! 	if (required_outer &&
! 		((!bms_overlap(required_outer, extra->param_source_rels) &&
! 		  !allow_star_schema_join(root, outer_path, inner_path)) ||
! 		 have_dangerous_phv(root,
! 							outer_path->parent->relids,
! 							PATH_REQ_OUTER(inner_path))))
  	{
! 		/* Waste no memory when we reject a path here */
! 		bms_free(required_outer);
! 		return;
  	}
  
  	/*
--- 429,452 ----
  	 * says to allow it anyway.  Also, we must reject if have_dangerous_phv
  	 * doesn't like the look of it, which could only happen if the nestloop is
  	 * still parameterized.
+ 	 *
+ 	 * Grouped path should never be parameterized.
  	 */
! 	required_outer = calc_nestloop_required_outer(outerrelids, outer_paramrels,
! 												  innerrelids, inner_paramrels);
! 	if (required_outer)
  	{
! 		if (grouped ||
! 			(!bms_overlap(required_outer, extra->param_source_rels) &&
! 			 !allow_star_schema_join(root, outerrelids, inner_paramrels)) ||
! 			have_dangerous_phv(root,
! 							   outer_path->parent->relids,
! 							   PATH_REQ_OUTER(inner_path)))
! 		{
! 			/* Waste no memory when we reject a path here */
! 			bms_free(required_outer);
! 			return;
! 		}
  	}
  
  	/*
*************** try_nestloop_path(PlannerInfo *root,
*** 368,388 ****
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
  
! 	if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer))
  	{
! 		add_path(joinrel, (Path *)
! 				 create_nestloop_path(root,
! 									  joinrel,
! 									  jointype,
! 									  &workspace,
! 									  extra,
! 									  outer_path,
! 									  inner_path,
! 									  extra->restrictlist,
! 									  pathkeys,
! 									  required_outer));
  	}
  	else
  	{
--- 461,522 ----
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
  
!  	/*
!  	 * Determine which target the join should produce.
!  	 *
!  	 * In the case of explicit aggregation, output of the join itself is
!  	 * plain.
!  	 */
!  	if (!grouped || do_aggregate)
!  		join_target = joinrel->reltarget;
!  	else
!  		join_target = joinrel->gpi->target;
! 
!  	join_path = (Path *) create_nestloop_path(root, joinrel, jointype,
!  											  &workspace, extra,
!  											  outer_path, inner_path,
!  											  extra->restrictlist, pathkeys,
!  											  required_outer, join_target);
! 
!  	/* Do partial aggregation if needed. */
!  	if (do_aggregate && required_outer == NULL)
!  	{
!  		create_grouped_path(root, joinrel, join_path, true, false,
!  							AGG_HASHED);
!  		create_grouped_path(root, joinrel, join_path, true, false,
!  							AGG_SORTED);
!  	}
! 	else if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer, grouped))
  	{
! 		/*
! 		 * Since result produced by a child is part of the result produced by
! 		 * its topmost parent and has same properties, the parameters
! 		 * representing that parent may be substituted by values from a child.
! 		 * Hence expressions and hence paths using those expressions,
! 		 * parameterized by a parent can be said to be parameterized by any of
! 		 * its child.  For a join between child relations, if the inner path is
! 		 * parameterized by the parent of the outer relation,  create a
! 		 * nestloop join path with inner relation parameterized by the outer
! 		 * relation by translating the inner path to be parameterized by the
! 		 * outer child relation. The translated path should have the same costs
! 		 * as the original path, so cost check above should still hold.
! 		 */
! 		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
! 		{
! 			inner_path = reparameterize_path_by_child(root, inner_path,
! 													   outer_path->parent);
! 
! 			/*
! 			 * If we could not translate the path, we can't create nest loop
! 			 * path.
! 			 */
! 			if (!inner_path)
! 				return;
! 		}
! 
! 		add_path(joinrel, join_path, grouped);
  	}
  	else
  	{
*************** try_partial_nestloop_path(PlannerInfo *r
*** 403,411 ****
  						  Path *inner_path,
  						  List *pathkeys,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra)
  {
  	JoinCostWorkspace workspace;
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
--- 537,553 ----
  						  Path *inner_path,
  						  List *pathkeys,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool grouped,
! 						  bool do_aggregate)
  {
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* The same checks we do in try_nestloop_path. */
+ 	Assert(!do_aggregate || grouped);
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
*************** try_partial_nestloop_path(PlannerInfo *r
*** 428,448 ****
  	 */
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
! 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
  		return;
  
! 	/* Might be good enough to be worth trying, so let's try it. */
! 	add_partial_path(joinrel, (Path *)
! 					 create_nestloop_path(root,
! 										  joinrel,
! 										  jointype,
! 										  &workspace,
! 										  extra,
! 										  outer_path,
! 										  inner_path,
! 										  extra->restrictlist,
! 										  pathkeys,
! 										  NULL));
  }
  
  /*
--- 570,650 ----
  	 */
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
! 
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 	{
! 		Assert(joinrel->gpi != NULL);
! 		join_target = joinrel->gpi->target;
! 	}
! 
! 	join_path = (Path *) create_nestloop_path(root, joinrel, jointype,
! 											  &workspace, extra,
! 											  outer_path, inner_path,
! 											  extra->restrictlist, pathkeys,
! 											  NULL, join_target);
! 
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_HASHED);
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_SORTED);
! 	}
! 	else if (add_partial_path_precheck(joinrel, workspace.total_cost,
! 									   pathkeys, grouped))
! 	{
! 		/* Might be good enough to be worth trying, so let's try it. */
! 		add_partial_path(joinrel, (Path *) join_path, grouped);
! 	}
! }
! 
! static void
! try_grouped_nestloop_path(PlannerInfo *root,
! 						  RelOptInfo *joinrel,
! 						  Path *outer_path,
! 						  Path *inner_path,
! 						  List *pathkeys,
! 						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool do_aggregate,
! 						  bool partial)
! {
! 	/*
! 	 * Missing GroupedPathInfo indicates that we should not try to create a
! 	 * grouped join.
! 	 */
! 	if (joinrel->gpi == NULL)
  		return;
  
! 	/*
! 	 * Reject the path if we're supposed to combine grouped and plain relation
! 	 * but the grouped one does not evaluate all the relevant aggregates.
! 	 */
! 	if (!do_aggregate &&
! 		!is_grouped_join_target_complete(root, joinrel->gpi->target,
! 										 outer_path, inner_path))
! 		return;
! 
! 	/*
! 	 * As repeated aggregation doesn't seem to be attractive, make sure that
! 	 * the resulting grouped relation is not parameterized.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return;
! 
! 	if (!partial)
! 		try_nestloop_path(root, joinrel, outer_path, inner_path, pathkeys,
! 						  jointype, extra, true, do_aggregate);
! 	else
! 		try_partial_nestloop_path(root, joinrel, outer_path, inner_path,
! 								  pathkeys, jointype, extra, true,
! 								  do_aggregate);
  }
  
  /*
*************** try_mergejoin_path(PlannerInfo *root,
*** 461,470 ****
  				   List *innersortkeys,
  				   JoinType jointype,
  				   JoinPathExtraData *extra,
! 				   bool is_partial)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
  
  	if (is_partial)
  	{
--- 663,682 ----
  				   List *innersortkeys,
  				   JoinType jointype,
  				   JoinPathExtraData *extra,
! 				   bool is_partial,
! 				   bool grouped,
! 				   bool do_aggregate)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* Caller should not request aggregation w/o grouped output. */
+ 	Assert(!do_aggregate || grouped);
+ 
+ 	/* GroupedPathInfo is necessary for us to produce a grouped set. */
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	if (is_partial)
  	{
*************** try_mergejoin_path(PlannerInfo *root,
*** 477,498 ****
  								   outersortkeys,
  								   innersortkeys,
  								   jointype,
! 								   extra);
  		return;
  	}
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if the
! 	 * parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path,
! 													  inner_path);
! 	if (required_outer &&
! 		!bms_overlap(required_outer, extra->param_source_rels))
  	{
! 		/* Waste no memory when we reject a path here */
! 		bms_free(required_outer);
! 		return;
  	}
  
  	/*
--- 689,713 ----
  								   outersortkeys,
  								   innersortkeys,
  								   jointype,
! 								   extra,
! 								   grouped,
! 								   do_aggregate);
  		return;
  	}
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if
! 	 * it's grouped or if the parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path, inner_path);
! 	if (required_outer)
  	{
! 		if (grouped || !bms_overlap(required_outer, extra->param_source_rels))
! 		{
! 			/* Waste no memory when we reject a path here */
! 			bms_free(required_outer);
! 			return;
! 		}
  	}
  
  	/*
*************** try_mergejoin_path(PlannerInfo *root,
*** 511,537 ****
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys,
! 						   extra);
  
! 	if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer))
  	{
! 		add_path(joinrel, (Path *)
! 				 create_mergejoin_path(root,
! 									   joinrel,
! 									   jointype,
! 									   &workspace,
! 									   extra,
! 									   outer_path,
! 									   inner_path,
! 									   extra->restrictlist,
! 									   pathkeys,
! 									   required_outer,
! 									   mergeclauses,
! 									   outersortkeys,
! 									   innersortkeys));
  	}
  	else
  	{
--- 726,773 ----
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys, extra);
  
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 		join_target = joinrel->gpi->target;
! 
! 
! 	join_path = (Path *) create_mergejoin_path(root,
! 											   joinrel,
! 											   jointype,
! 											   &workspace,
! 											   extra,
! 											   outer_path,
! 											   inner_path,
! 											   extra->restrictlist,
! 											   pathkeys,
! 											   required_outer,
! 											   mergeclauses,
! 											   outersortkeys,
! 											   innersortkeys,
! 											   join_target);
! 
! 	/* Do partial aggregation if needed. */
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, false,
! 								  AGG_HASHED);
! 		create_grouped_path(root, joinrel, join_path, true, false,
! 								  AGG_SORTED);
! 	}
! 	else if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer, grouped))
  	{
! 		add_path(joinrel, (Path *) join_path, grouped);
  	}
  	else
  	{
*************** try_partial_mergejoin_path(PlannerInfo *
*** 555,563 ****
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra)
  {
  	JoinCostWorkspace workspace;
  
  	/*
  	 * See comments in try_partial_hashjoin_path().
--- 791,807 ----
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped,
! 						   bool do_aggregate)
  {
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* The same checks we do in try_mergejoin_path. */
+ 	Assert(!do_aggregate || grouped);
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
  	 * See comments in try_partial_hashjoin_path().
*************** try_partial_mergejoin_path(PlannerInfo *
*** 587,613 ****
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys,
! 						   extra);
  
! 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
  		return;
  
! 	/* Might be good enough to be worth trying, so let's try it. */
! 	add_partial_path(joinrel, (Path *)
! 					 create_mergejoin_path(root,
! 										   joinrel,
! 										   jointype,
! 										   &workspace,
! 										   extra,
! 										   outer_path,
! 										   inner_path,
! 										   extra->restrictlist,
! 										   pathkeys,
! 										   NULL,
! 										   mergeclauses,
! 										   outersortkeys,
! 										   innersortkeys));
  }
  
  /*
--- 831,1003 ----
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys, extra);
  
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 	{
! 		Assert(joinrel->gpi != NULL);
! 		join_target = joinrel->gpi->target;
! 	}
! 
! 	join_path = (Path *) create_mergejoin_path(root,
! 											   joinrel,
! 											   jointype,
! 											   &workspace,
! 											   extra,
! 											   outer_path,
! 											   inner_path,
! 											   extra->restrictlist,
! 											   pathkeys,
! 											   NULL,
! 											   mergeclauses,
! 											   outersortkeys,
! 											   innersortkeys,
! 											   join_target);
! 
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_HASHED);
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_SORTED);
! 	}
! 	else if (add_partial_path_precheck(joinrel, workspace.total_cost,
! 									   pathkeys, grouped))
! 	{
! 		/* Might be good enough to be worth trying, so let's try it. */
! 		add_partial_path(joinrel, (Path *) join_path, grouped);
! 	}
! }
! 
! static void
! try_grouped_mergejoin_path(PlannerInfo *root,
! 						   RelOptInfo *joinrel,
! 						   Path *outer_path,
! 						   Path *inner_path,
! 						   List *pathkeys,
! 						   List *mergeclauses,
! 						   List *outersortkeys,
! 						   List *innersortkeys,
! 						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool partial,
! 						   bool do_aggregate)
! {
! 	/*
! 	 * Missing GroupedPathInfo indicates that we should not try to create a
! 	 * grouped join.
! 	 */
! 	if (joinrel->gpi == NULL)
  		return;
  
! 	/*
! 	 * Reject the path if we're supposed to combine grouped and plain relation
! 	 * but the grouped one does not evaluate all the relevant aggregates.
! 	 */
! 	if (!do_aggregate &&
! 		!is_grouped_join_target_complete(root, joinrel->gpi->target,
! 										 outer_path, inner_path))
! 		return;
! 
! 	/*
! 	 * As repeated aggregation doesn't seem to be attractive, make sure that
! 	 * the resulting grouped relation is not parameterized.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return;
! 
! 	if (!partial)
! 		try_mergejoin_path(root, joinrel, outer_path, inner_path, pathkeys,
! 						   mergeclauses, outersortkeys, innersortkeys,
! 						   jointype, extra, false, true, do_aggregate);
! 	else
! 		try_partial_mergejoin_path(root, joinrel, outer_path, inner_path,
! 								   pathkeys,
! 								   mergeclauses, outersortkeys, innersortkeys,
! 								   jointype, extra, true, do_aggregate);
! }
! 
! static void
! try_mergejoin_path_common(PlannerInfo *root,
! 						  RelOptInfo *joinrel,
! 						  Path *outer_path,
! 						  Path *inner_path,
! 						  List *pathkeys,
! 						  List *mergeclauses,
! 						  List *outersortkeys,
! 						  List *innersortkeys,
! 						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool partial,
! 						  bool grouped_outer,
! 						  bool grouped_inner,
! 						  bool do_aggregate)
! {
! 	bool		grouped_join;
! 
! 	grouped_join = grouped_outer || grouped_inner || do_aggregate;
! 
! 	/* Join of two grouped paths is not supported. */
! 	Assert(!(grouped_outer && grouped_inner));
! 
! 	if (!grouped_join)
! 	{
! 		/* Only join plain paths. */
! 		try_mergejoin_path(root,
! 						   joinrel,
! 						   outer_path,
! 						   inner_path,
! 						   pathkeys,
! 						   mergeclauses,
! 						   outersortkeys,
! 						   innersortkeys,
! 						   jointype,
! 						   extra,
! 						   partial,
! 						   false, false);
! 	}
! 	else if (grouped_outer || grouped_inner)
! 	{
! 		Assert(!do_aggregate);
! 
! 		/*
! 		 * Exactly one of the input paths is grouped, so create a grouped join
! 		 * path.
! 		 */
! 		try_grouped_mergejoin_path(root,
! 								   joinrel,
! 								   outer_path,
! 								   inner_path,
! 								   pathkeys,
! 								   mergeclauses,
! 								   outersortkeys,
! 								   innersortkeys,
! 								   jointype,
! 								   extra,
! 								   partial,
! 								   false);
! 	}
! 	/* Preform explicit aggregation only if suitable target exists. */
! 	else if (joinrel->gpi != NULL)
! 	{
! 		try_grouped_mergejoin_path(root,
! 								   joinrel,
! 								   outer_path,
! 								   inner_path,
! 								   pathkeys,
! 								   mergeclauses,
! 								   outersortkeys,
! 								   innersortkeys,
! 								   jointype,
! 								   extra,
! 								   partial, true);
! 	}
  }
  
  /*
*************** try_hashjoin_path(PlannerInfo *root,
*** 622,668 ****
  				  Path *inner_path,
  				  List *hashclauses,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if the
! 	 * parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path,
! 													  inner_path);
! 	if (required_outer &&
! 		!bms_overlap(required_outer, extra->param_source_rels))
  	{
! 		/* Waste no memory when we reject a path here */
! 		bms_free(required_outer);
! 		return;
  	}
  
  	/*
  	 * See comments in try_nestloop_path().  Also note that hashjoin paths
  	 * never have any output pathkeys, per comments in create_hashjoin_path.
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
  
! 	if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  NIL, required_outer))
  	{
! 		add_path(joinrel, (Path *)
! 				 create_hashjoin_path(root,
! 									  joinrel,
! 									  jointype,
! 									  &workspace,
! 									  extra,
! 									  outer_path,
! 									  inner_path,
! 									  extra->restrictlist,
! 									  required_outer,
! 									  hashclauses));
  	}
  	else
  	{
--- 1012,1086 ----
  				  Path *inner_path,
  				  List *hashclauses,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra,
! 				  bool grouped,
! 				  bool do_aggregate)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* Caller should not request aggregation w/o grouped output. */
+ 	Assert(!do_aggregate || grouped);
+ 
+ 	/* GroupedPathInfo is necessary for us to produce a grouped set. */
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if
! 	 * it's grouped or if the parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path, inner_path);
! 	if (required_outer)
  	{
! 		if (grouped || !bms_overlap(required_outer, extra->param_source_rels))
! 		{
! 			/* Waste no memory when we reject a path here */
! 			bms_free(required_outer);
! 			return;
! 		}
  	}
  
  	/*
  	 * See comments in try_nestloop_path().  Also note that hashjoin paths
  	 * never have any output pathkeys, per comments in create_hashjoin_path.
+ 	 *
+ 	 * TODO Need to consider aggregation here?
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
  
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 		join_target = joinrel->gpi->target;
! 
! 	join_path = (Path *) create_hashjoin_path(root, joinrel, jointype,
! 											  &workspace,
! 											  extra,
! 											  outer_path, inner_path,
! 											  extra->restrictlist,
! 											  required_outer, hashclauses,
! 											  join_target);
! 
! 	/* Do partial aggregation if needed. */
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, false,
! 								  AGG_HASHED);
! 	}
! 	else if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  NIL, required_outer, grouped))
  	{
! 		add_path(joinrel, (Path *) join_path, grouped);
  	}
  	else
  	{
*************** try_partial_hashjoin_path(PlannerInfo *r
*** 683,691 ****
  						  Path *inner_path,
  						  List *hashclauses,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra)
  {
  	JoinCostWorkspace workspace;
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
--- 1101,1117 ----
  						  Path *inner_path,
  						  List *hashclauses,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool grouped,
! 						  bool do_aggregate)
  {
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* The same checks we do in try_hashjoin_path. */
+ 	Assert(!do_aggregate || grouped);
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
*************** try_partial_hashjoin_path(PlannerInfo *r
*** 708,728 ****
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
! 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL))
  		return;
  
! 	/* Might be good enough to be worth trying, so let's try it. */
! 	add_partial_path(joinrel, (Path *)
! 					 create_hashjoin_path(root,
! 										  joinrel,
! 										  jointype,
! 										  &workspace,
! 										  extra,
! 										  outer_path,
! 										  inner_path,
! 										  extra->restrictlist,
! 										  NULL,
! 										  hashclauses));
  }
  
  /*
--- 1134,1229 ----
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
! 
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 	{
! 		Assert(joinrel->gpi != NULL);
! 		join_target = joinrel->gpi->target;
! 	}
! 
! 	join_path = (Path *) create_hashjoin_path(root, joinrel, jointype,
! 											  &workspace,
! 											  extra,
! 											  outer_path, inner_path,
! 											  extra->restrictlist, NULL,
! 											  hashclauses, join_target);
! 
! 	/* Do partial aggregation if needed. */
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_HASHED);
! 	}
! 	else if (add_partial_path_precheck(joinrel, workspace.total_cost,
! 									   NIL, grouped))
! 	{
! 		add_partial_path(joinrel, (Path *) join_path , grouped);
! 	}
! }
! 
! /*
!  * Create a new grouped hash join path by joining a grouped path to plain
!  * (non-grouped) one, or by joining 2 plain relations and applying grouping on
!  * the result.
!  *
!  * Joining of 2 grouped paths is not supported. If a grouped relation A was
!  * joined to grouped relation B, then the grouping of B reduces the number of
!  * times each group of A is appears in the join output. This makes difference
!  * for some aggregates, e.g. sum().
!  *
!  * If do_aggregate is true, neither input rel is grouped so we need to
!  * aggregate the join result explicitly.
!  *
!  * partial argument tells whether the join path should be considered partial.
!  */
! static void
! try_grouped_hashjoin_path(PlannerInfo *root,
! 						  RelOptInfo *joinrel,
! 						  Path *outer_path,
! 						  Path *inner_path,
! 						  List *hashclauses,
! 						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool do_aggregate,
! 						  bool partial)
! {
! 	/*
! 	 * Missing GroupedPathInfo indicates that we should not try to create a
! 	 * grouped join.
! 	 */
! 	if (joinrel->gpi == NULL)
  		return;
  
! 	/*
! 	 * Reject the path if we're supposed to combine grouped and plain relation
! 	 * but the grouped one does not evaluate all the relevant aggregates.
! 	 */
! 	if (!do_aggregate &&
! 		!is_grouped_join_target_complete(root, joinrel->gpi->target,
! 										 outer_path, inner_path))
! 		return;
! 
! 	/*
! 	 * As repeated aggregation doesn't seem to be attractive, make sure that
! 	 * the resulting grouped relation is not parameterized.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return;
! 
! 	if (!partial)
! 		try_hashjoin_path(root, joinrel, outer_path, inner_path, hashclauses,
! 						  jointype, extra, true, do_aggregate);
! 	else
! 		try_partial_hashjoin_path(root, joinrel, outer_path, inner_path,
! 								  hashclauses, jointype, extra, true,
! 								  do_aggregate);
  }
  
  /*
*************** sort_inner_and_outer(PlannerInfo *root,
*** 773,779 ****
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	Path	   *outer_path;
--- 1274,1313 ----
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra,
! 					 bool grouped)
! {
! 	if (!grouped)
! 	{
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, false, false, false);
! 	}
! 	else
! 	{
! 		/* Use all the supported strategies to generate grouped join. */
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, true, false, false);
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, false, true, false);
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, false, false, true);
! 	}
! }
! 
! /*
!  * TODO As merge_pathkeys shouldn't differ across execution, use a separate
!  * function to derive them and pass them here in a list.
!  */
! static void
! sort_inner_and_outer_common(PlannerInfo *root,
! 							RelOptInfo *joinrel,
! 							RelOptInfo *outerrel,
! 							RelOptInfo *innerrel,
! 							JoinType jointype,
! 							JoinPathExtraData *extra,
! 							bool grouped_outer,
! 							bool grouped_inner,
! 							bool do_aggregate)
  {
  	JoinType	save_jointype = jointype;
  	Path	   *outer_path;
*************** sort_inner_and_outer(PlannerInfo *root,
*** 782,787 ****
--- 1316,1322 ----
  	Path	   *cheapest_safe_inner = NULL;
  	List	   *all_pathkeys;
  	ListCell   *l;
+ 	bool	grouped_result;
  
  	/*
  	 * We only consider the cheapest-total-cost input paths, since we are
*************** sort_inner_and_outer(PlannerInfo *root,
*** 796,803 ****
  	 * against mergejoins with parameterized inputs; see comments in
  	 * src/backend/optimizer/README.
  	 */
! 	outer_path = outerrel->cheapest_total_path;
! 	inner_path = innerrel->cheapest_total_path;
  
  	/*
  	 * If either cheapest-total path is parameterized by the other rel, we
--- 1331,1357 ----
  	 * against mergejoins with parameterized inputs; see comments in
  	 * src/backend/optimizer/README.
  	 */
! 	if (grouped_outer)
! 	{
! 		if (outerrel->gpi != NULL && outerrel->gpi->pathlist != NIL)
! 			outer_path = linitial(outerrel->gpi->pathlist);
! 		else
! 			return;
! 	}
! 	else
! 		outer_path = outerrel->cheapest_total_path;
! 
! 	if (grouped_inner)
! 	{
! 		if (innerrel->gpi != NULL && innerrel->gpi->pathlist != NIL)
! 			inner_path = linitial(innerrel->gpi->pathlist);
! 		else
! 			return;
! 	}
! 	else
! 		inner_path = innerrel->cheapest_total_path;
! 
! 	grouped_result = grouped_outer || grouped_inner || do_aggregate;
  
  	/*
  	 * If either cheapest-total path is parameterized by the other rel, we
*************** sort_inner_and_outer(PlannerInfo *root,
*** 843,855 ****
  		outerrel->partial_pathlist != NIL &&
  		bms_is_empty(joinrel->lateral_relids))
  	{
! 		cheapest_partial_outer = (Path *) linitial(outerrel->partial_pathlist);
  
  		if (inner_path->parallel_safe)
  			cheapest_safe_inner = inner_path;
  		else if (save_jointype != JOIN_UNIQUE_INNER)
  			cheapest_safe_inner =
! 				get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
  	}
  
  	/*
--- 1397,1446 ----
  		outerrel->partial_pathlist != NIL &&
  		bms_is_empty(joinrel->lateral_relids))
  	{
! 		if (grouped_outer)
! 		{
! 			if (outerrel->gpi != NULL && outerrel->gpi->partial_pathlist != NIL)
! 				cheapest_partial_outer = (Path *)
! 					linitial(outerrel->gpi->partial_pathlist);
! 			else
! 				return;
! 		}
! 		else
! 			cheapest_partial_outer = (Path *)
! 				linitial(outerrel->partial_pathlist);
! 
! 		if (grouped_inner)
! 		{
! 			if (innerrel->gpi != NULL && innerrel->gpi->pathlist != NIL)
! 				inner_path = linitial(innerrel->gpi->pathlist);
! 			else
! 				return;
! 		}
! 		else
! 			inner_path = innerrel->cheapest_total_path;
  
  		if (inner_path->parallel_safe)
  			cheapest_safe_inner = inner_path;
  		else if (save_jointype != JOIN_UNIQUE_INNER)
+ 		{
+ 			List	*inner_pathlist;
+ 
+ 			if (!grouped_inner)
+ 				inner_pathlist = innerrel->pathlist;
+ 			else
+ 			{
+ 				Assert(innerrel->gpi != NULL);
+ 				inner_pathlist = innerrel->gpi->pathlist;
+ 			}
+ 
+ 			/*
+ 			 * All the grouped paths should be unparameterized, so the
+ 			 * function is overly stringent in the grouped_inner case, but
+ 			 * still useful.
+ 			 */
  			cheapest_safe_inner =
! 				get_cheapest_parallel_safe_total_inner(inner_pathlist);
! 		}
  	}
  
  	/*
*************** sort_inner_and_outer(PlannerInfo *root,
*** 925,957 ****
  		 * properly.  try_mergejoin_path will detect that case and suppress an
  		 * explicit sort step, so we needn't do so here.
  		 */
! 		try_mergejoin_path(root,
! 						   joinrel,
! 						   outer_path,
! 						   inner_path,
! 						   merge_pathkeys,
! 						   cur_mergeclauses,
! 						   outerkeys,
! 						   innerkeys,
! 						   jointype,
! 						   extra,
! 						   false);
  
  		/*
  		 * If we have partial outer and parallel safe inner path then try
  		 * partial mergejoin path.
  		 */
  		if (cheapest_partial_outer && cheapest_safe_inner)
! 			try_partial_mergejoin_path(root,
! 									   joinrel,
! 									   cheapest_partial_outer,
! 									   cheapest_safe_inner,
! 									   merge_pathkeys,
! 									   cur_mergeclauses,
! 									   outerkeys,
! 									   innerkeys,
! 									   jointype,
! 									   extra);
  	}
  }
  
--- 1516,1574 ----
  		 * properly.  try_mergejoin_path will detect that case and suppress an
  		 * explicit sort step, so we needn't do so here.
  		 */
! 		if (!grouped_result)
! 			try_mergejoin_path(root,
! 							   joinrel,
! 							   outer_path,
! 							   inner_path,
! 							   merge_pathkeys,
! 							   cur_mergeclauses,
! 							   outerkeys,
! 							   innerkeys,
! 							   jointype,
! 							   extra,
! 							   false, false, false);
! 		else
! 		{
! 			try_mergejoin_path_common(root, joinrel, outer_path, inner_path,
! 									  merge_pathkeys, cur_mergeclauses,
! 									  outerkeys, innerkeys, jointype, extra,
! 									  false,
! 									  grouped_outer, grouped_inner,
! 									  do_aggregate);
! 		}
  
  		/*
  		 * If we have partial outer and parallel safe inner path then try
  		 * partial mergejoin path.
  		 */
  		if (cheapest_partial_outer && cheapest_safe_inner)
! 		{
! 			if (!grouped_result)
! 			{
! 				try_partial_mergejoin_path(root,
! 										   joinrel,
! 										   cheapest_partial_outer,
! 										   cheapest_safe_inner,
! 										   merge_pathkeys,
! 										   cur_mergeclauses,
! 										   outerkeys,
! 										   innerkeys,
! 										   jointype,
! 										   extra, false, false);
! 			}
! 			else
! 			{
! 				try_mergejoin_path_common(root, joinrel,
! 										  cheapest_partial_outer,
! 										  cheapest_safe_inner,
! 										  merge_pathkeys, cur_mergeclauses,
! 										  outerkeys, innerkeys, jointype, extra,
! 										  true,
! 										  grouped_outer, grouped_inner,
! 										  do_aggregate);
! 			}
! 		}
  	}
  }
  
*************** sort_inner_and_outer(PlannerInfo *root,
*** 968,973 ****
--- 1585,1598 ----
   * some sort key requirements).  So, we consider truncations of the
   * mergeclause list as well as the full list.  (Ideally we'd consider all
   * subsets of the mergeclause list, but that seems way too expensive.)
+  *
+  * grouped_outer - is outerpath grouped?
+  * grouped_inner - use grouped paths of innerrel?
+  * do_aggregate - apply (partial) aggregation to the output?
+  *
+  * TODO If subsequent calls often differ only by the 3 arguments above,
+  * consider a workspace structure to share useful info (eg merge clauses)
+  * across calls.
   */
  static void
  generate_mergejoin_paths(PlannerInfo *root,
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 979,985 ****
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial)
  {
  	List	   *mergeclauses;
  	List	   *innersortkeys;
--- 1604,1613 ----
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial,
! 						 bool grouped_outer,
! 						 bool grouped_inner,
! 						 bool do_aggregate)
  {
  	List	   *mergeclauses;
  	List	   *innersortkeys;
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1030,1046 ****
  	 * try_mergejoin_path will do the right thing if inner_cheapest_total is
  	 * already correctly sorted.)
  	 */
! 	try_mergejoin_path(root,
! 					   joinrel,
! 					   outerpath,
! 					   inner_cheapest_total,
! 					   merge_pathkeys,
! 					   mergeclauses,
! 					   NIL,
! 					   innersortkeys,
! 					   jointype,
! 					   extra,
! 					   is_partial);
  
  	/* Can't do anything else if inner path needs to be unique'd */
  	if (save_jointype == JOIN_UNIQUE_INNER)
--- 1658,1675 ----
  	 * try_mergejoin_path will do the right thing if inner_cheapest_total is
  	 * already correctly sorted.)
  	 */
! 	try_mergejoin_path_common(root,
! 							  joinrel,
! 							  outerpath,
! 							  inner_cheapest_total,
! 							  merge_pathkeys,
! 							  mergeclauses,
! 							  NIL,
! 							  innersortkeys,
! 							  jointype,
! 							  extra,
! 							  is_partial,
! 							  grouped_outer, grouped_inner, do_aggregate);
  
  	/* Can't do anything else if inner path needs to be unique'd */
  	if (save_jointype == JOIN_UNIQUE_INNER)
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1096,1111 ****
  
  	for (sortkeycnt = num_sortkeys; sortkeycnt > 0; sortkeycnt--)
  	{
  		Path	   *innerpath;
  		List	   *newclauses = NIL;
  
  		/*
  		 * Look for an inner path ordered well enough for the first
  		 * 'sortkeycnt' innersortkeys.  NB: trialsortkeys list is modified
  		 * destructively, which is why we made a copy...
  		 */
  		trialsortkeys = list_truncate(trialsortkeys, sortkeycnt);
! 		innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
  												   trialsortkeys,
  												   NULL,
  												   TOTAL_COST,
--- 1725,1746 ----
  
  	for (sortkeycnt = num_sortkeys; sortkeycnt > 0; sortkeycnt--)
  	{
+ 		List		*inner_pathlist = NIL;
  		Path	   *innerpath;
  		List	   *newclauses = NIL;
  
+ 		if (!grouped_inner)
+ 			inner_pathlist = innerrel->pathlist;
+ 		else if (innerrel->gpi != NULL)
+ 			inner_pathlist = innerrel->gpi->pathlist;
+ 
  		/*
  		 * Look for an inner path ordered well enough for the first
  		 * 'sortkeycnt' innersortkeys.  NB: trialsortkeys list is modified
  		 * destructively, which is why we made a copy...
  		 */
  		trialsortkeys = list_truncate(trialsortkeys, sortkeycnt);
! 		innerpath = get_cheapest_path_for_pathkeys(inner_pathlist,
  												   trialsortkeys,
  												   NULL,
  												   TOTAL_COST,
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1128,1148 ****
  			}
  			else
  				newclauses = mergeclauses;
! 			try_mergejoin_path(root,
! 							   joinrel,
! 							   outerpath,
! 							   innerpath,
! 							   merge_pathkeys,
! 							   newclauses,
! 							   NIL,
! 							   NIL,
! 							   jointype,
! 							   extra,
! 							   is_partial);
  			cheapest_total_inner = innerpath;
  		}
  		/* Same on the basis of cheapest startup cost ... */
! 		innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
  												   trialsortkeys,
  												   NULL,
  												   STARTUP_COST,
--- 1763,1787 ----
  			}
  			else
  				newclauses = mergeclauses;
! 
! 			try_mergejoin_path_common(root,
! 									  joinrel,
! 									  outerpath,
! 									  innerpath,
! 									  merge_pathkeys,
! 									  newclauses,
! 									  NIL,
! 									  NIL,
! 									  jointype,
! 									  extra,
! 									  is_partial,
! 									  grouped_outer, grouped_inner,
! 									  do_aggregate);
! 
  			cheapest_total_inner = innerpath;
  		}
  		/* Same on the basis of cheapest startup cost ... */
! 		innerpath = get_cheapest_path_for_pathkeys(inner_pathlist,
  												   trialsortkeys,
  												   NULL,
  												   STARTUP_COST,
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1173,1189 ****
  					else
  						newclauses = mergeclauses;
  				}
! 				try_mergejoin_path(root,
! 								   joinrel,
! 								   outerpath,
! 								   innerpath,
! 								   merge_pathkeys,
! 								   newclauses,
! 								   NIL,
! 								   NIL,
! 								   jointype,
! 								   extra,
! 								   is_partial);
  			}
  			cheapest_startup_inner = innerpath;
  		}
--- 1812,1830 ----
  					else
  						newclauses = mergeclauses;
  				}
! 				try_mergejoin_path_common(root,
! 										  joinrel,
! 										  outerpath,
! 										  innerpath,
! 										  merge_pathkeys,
! 										  newclauses,
! 										  NIL,
! 										  NIL,
! 										  jointype,
! 										  extra,
! 										  is_partial,
! 										  grouped_outer, grouped_inner,
! 										  do_aggregate);
  			}
  			cheapest_startup_inner = innerpath;
  		}
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1218,1223 ****
--- 1859,1866 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  * 'grouped' indicates that the at least one relation in the join has been
+  * aggregated.
   */
  static void
  match_unsorted_outer(PlannerInfo *root,
*************** match_unsorted_outer(PlannerInfo *root,
*** 1225,1231 ****
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	bool		nestjoinOK;
--- 1868,1875 ----
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra,
! 					 bool grouped)
  {
  	JoinType	save_jointype = jointype;
  	bool		nestjoinOK;
*************** match_unsorted_outer(PlannerInfo *root,
*** 1235,1240 ****
--- 1879,1906 ----
  	ListCell   *lc1;
  
  	/*
+ 	 * If grouped join path is requested, we ignore cases where either input
+ 	 * path needs to be unique. For each side we should expect either grouped
+ 	 * or plain relation, which differ quite a bit.
+ 	 *
+ 	 * XXX Although unique-ification of grouped path might result in too
+ 	 * expensive input path (note that grouped input relation is not
+ 	 * necessarily unique, regardless the grouping keys --- one or more plain
+ 	 * relation could already have been joined to it), we might want to
+ 	 * unique-ify the input relation in the future at least in the case it's a
+ 	 * plain relation.
+ 	 *
+ 	 * (Materialization is not involved in grouped paths for similar reasons.)
+ 	 */
+ 	if (grouped &&
+ 		(jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER))
+ 		return;
+ 
+ 	/* No grouped join w/o grouped target. */
+ 	if (grouped && joinrel->gpi == NULL)
+ 		return;
+ 
+ 	/*
  	 * Nestloop only supports inner, left, semi, and anti joins.  Also, if we
  	 * are doing a right or full mergejoin, we must use *all* the mergeclauses
  	 * as join clauses, else we will not have a valid plan.  (Although these
*************** match_unsorted_outer(PlannerInfo *root,
*** 1290,1296 ****
  			create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
  		Assert(inner_cheapest_total);
  	}
! 	else if (nestjoinOK)
  	{
  		/*
  		 * Consider materializing the cheapest inner path, unless
--- 1956,1962 ----
  			create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
  		Assert(inner_cheapest_total);
  	}
! 	else if (nestjoinOK && !grouped)
  	{
  		/*
  		 * Consider materializing the cheapest inner path, unless
*************** match_unsorted_outer(PlannerInfo *root,
*** 1321,1326 ****
--- 1987,1994 ----
  		 */
  		if (save_jointype == JOIN_UNIQUE_OUTER)
  		{
+ 			Assert(!grouped);
+ 
  			if (outerpath != outerrel->cheapest_total_path)
  				continue;
  			outerpath = (Path *) create_unique_path(root, outerrel,
*************** match_unsorted_outer(PlannerInfo *root,
*** 1348,1354 ****
  							  inner_cheapest_total,
  							  merge_pathkeys,
  							  jointype,
! 							  extra);
  		}
  		else if (nestjoinOK)
  		{
--- 2016,2023 ----
  							  inner_cheapest_total,
  							  merge_pathkeys,
  							  jointype,
! 							  extra,
! 							  false, false);
  		}
  		else if (nestjoinOK)
  		{
*************** match_unsorted_outer(PlannerInfo *root,
*** 1364,1387 ****
  			{
  				Path	   *innerpath = (Path *) lfirst(lc2);
  
! 				try_nestloop_path(root,
! 								  joinrel,
! 								  outerpath,
! 								  innerpath,
! 								  merge_pathkeys,
! 								  jointype,
! 								  extra);
  			}
  
! 			/* Also consider materialized form of the cheapest inner path */
! 			if (matpath != NULL)
  				try_nestloop_path(root,
  								  joinrel,
  								  outerpath,
  								  matpath,
  								  merge_pathkeys,
  								  jointype,
! 								  extra);
  		}
  
  		/* Can't do anything else if outer path needs to be unique'd */
--- 2033,2078 ----
  			{
  				Path	   *innerpath = (Path *) lfirst(lc2);
  
! 				if (!grouped)
! 					try_nestloop_path(root,
! 									  joinrel,
! 									  outerpath,
! 									  innerpath,
! 									  merge_pathkeys,
! 									  jointype,
! 									  extra, false, false);
! 				else
! 				{
! 					/*
! 					 * Since both input paths are plain, request explicit
! 					 * aggregation.
! 					 */
! 					try_grouped_nestloop_path(root,
! 											  joinrel,
! 											  outerpath,
! 											  innerpath,
! 											  merge_pathkeys,
! 											  jointype,
! 											  extra,
! 											  true,
! 											  false);
! 				}
  			}
  
! 			/*
! 			 * Also consider materialized form of the cheapest inner path.
! 			 *
! 			 * (There's no matpath for grouped join.)
! 			 */
! 			if (matpath != NULL && !grouped)
  				try_nestloop_path(root,
  								  joinrel,
  								  outerpath,
  								  matpath,
  								  merge_pathkeys,
  								  jointype,
! 								  extra,
! 								  false, false);
  		}
  
  		/* Can't do anything else if outer path needs to be unique'd */
*************** match_unsorted_outer(PlannerInfo *root,
*** 1396,1402 ****
  		generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
  								 save_jointype, extra, useallclauses,
  								 inner_cheapest_total, merge_pathkeys,
! 								 false);
  	}
  
  	/*
--- 2087,2163 ----
  		generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
  								 save_jointype, extra, useallclauses,
  								 inner_cheapest_total, merge_pathkeys,
! 								 false, false, false, grouped);
! 
! 		/* Try to join the plain outer relation to grouped inner. */
! 		if (grouped && nestjoinOK &&
! 			save_jointype != JOIN_UNIQUE_OUTER &&
! 			save_jointype != JOIN_UNIQUE_INNER &&
! 			innerrel->gpi != NULL && outerrel->gpi == NULL)
! 		{
! 			Path	*inner_cheapest_grouped = (Path *) linitial(innerrel->gpi->pathlist);
! 
! 			if (PATH_PARAM_BY_REL(inner_cheapest_grouped, outerrel))
! 				continue;
! 
! 			/* XXX Shouldn't Assert() be used here instead? */
! 			if (PATH_PARAM_BY_REL(outerpath, innerrel))
! 				continue;
! 
! 			/*
! 			 * Only outer grouped path is interesting in this case: grouped
! 			 * path on the inner side of NL join would imply repeated
! 			 * aggregation somewhere in the inner path.
! 			 */
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 save_jointype, extra, useallclauses,
! 									 inner_cheapest_grouped, merge_pathkeys,
! 									 false, false, true, false);
! 		}
! 	}
! 
! 	/*
! 	 * Combine grouped outer and plain inner paths.
! 	 */
! 	if (grouped && nestjoinOK &&
! 		save_jointype != JOIN_UNIQUE_OUTER &&
! 		save_jointype != JOIN_UNIQUE_INNER)
! 	{
! 		/*
! 		 * If the inner rel had a grouped target, its plain paths should be
! 		 * ignored. Otherwise we could create grouped paths with different
! 		 * targets.
! 		 */
! 		if (outerrel->gpi != NULL && innerrel->gpi == NULL &&
! 			inner_cheapest_total != NULL)
! 		{
! 			/* Nested loop paths. */
! 			foreach(lc1, outerrel->gpi->pathlist)
! 			{
! 				Path	   *outerpath = (Path *) lfirst(lc1);
! 				List	*merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
! 															  outerpath->pathkeys);
! 
! 				if (PATH_PARAM_BY_REL(outerpath, innerrel))
! 					continue;
! 
! 				try_grouped_nestloop_path(root,
! 										  joinrel,
! 										  outerpath,
! 										  inner_cheapest_total,
! 										  merge_pathkeys,
! 										  jointype,
! 										  extra,
! 										  false,
! 										  false);
! 
! 				/* Merge join paths. */
! 				generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 										 save_jointype, extra, useallclauses,
! 										 inner_cheapest_total, merge_pathkeys,
! 										 false, true, false, false);
! 			}
! 		}
  	}
  
  	/*
*************** match_unsorted_outer(PlannerInfo *root,
*** 1416,1423 ****
  		bms_is_empty(joinrel->lateral_relids))
  	{
  		if (nestjoinOK)
! 			consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 									   save_jointype, extra);
  
  		/*
  		 * If inner_cheapest_total is NULL or non parallel-safe then find the
--- 2177,2197 ----
  		bms_is_empty(joinrel->lateral_relids))
  	{
  		if (nestjoinOK)
! 		{
! 			if (!grouped)
! 				/* Plain partial paths. */
! 				consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 									   save_jointype, extra, false, false);
! 			else
! 			{
! 				/* Grouped partial paths with explicit aggregation. */
! 				consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 										   save_jointype, extra, true, true);
! 				/* Grouped partial paths w/o explicit aggregation. */
! 				consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 										   save_jointype, extra, true, false);
! 			}
! 		}
  
  		/*
  		 * If inner_cheapest_total is NULL or non parallel-safe then find the
*************** match_unsorted_outer(PlannerInfo *root,
*** 1437,1443 ****
  		if (inner_cheapest_total)
  			consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
  										save_jointype, extra,
! 										inner_cheapest_total);
  	}
  }
  
--- 2211,2217 ----
  		if (inner_cheapest_total)
  			consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
  										save_jointype, extra,
! 										inner_cheapest_total, grouped);
  	}
  }
  
*************** consider_parallel_mergejoin(PlannerInfo
*** 1460,1469 ****
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total)
  {
  	ListCell   *lc1;
  
  	/* generate merge join path for each partial outer path */
  	foreach(lc1, outerrel->partial_pathlist)
  	{
--- 2234,2252 ----
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total,
! 							bool grouped)
  {
  	ListCell   *lc1;
  
+ 	if (grouped)
+ 	{
+ 		/* TODO Consider if these types should be supported. */
+ 		if (jointype == JOIN_UNIQUE_OUTER ||
+ 			jointype == JOIN_UNIQUE_INNER)
+ 			return;
+ 	}
+ 
  	/* generate merge join path for each partial outer path */
  	foreach(lc1, outerrel->partial_pathlist)
  	{
*************** consider_parallel_mergejoin(PlannerInfo
*** 1476,1484 ****
  		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
  											 outerpath->pathkeys);
  
! 		generate_mergejoin_paths(root, joinrel, innerrel, outerpath, jointype,
! 								 extra, false, inner_cheapest_total,
! 								 merge_pathkeys, true);
  	}
  }
  
--- 2259,2314 ----
  		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
  											 outerpath->pathkeys);
  
! 		if (!grouped)
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 jointype, extra, false,
! 									 inner_cheapest_total, merge_pathkeys,
! 									 true,
! 									 false, false, false);
! 		else
! 		{
! 			/*
! 			 * Create grouped join by joining plain rels and aggregating the
! 			 * result.
! 			 */
! 			Assert(joinrel->gpi != NULL);
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 jointype, extra, false,
! 									 inner_cheapest_total, merge_pathkeys,
! 									 true, false, false, true);
! 
! 			/* Combine the plain outer with grouped inner one(s). */
! 			if (outerrel->gpi == NULL && innerrel->gpi != NULL)
! 			{
! 				Path	*inner_cheapest_grouped = (Path *)
! 					linitial(innerrel->gpi->pathlist);
! 
! 				if (inner_cheapest_grouped != NULL &&
! 					inner_cheapest_grouped->parallel_safe)
! 					generate_mergejoin_paths(root, joinrel, innerrel,
! 											 outerpath, jointype, extra,
! 											 false, inner_cheapest_grouped,
! 											 merge_pathkeys,
! 											 true, false, true, false);
! 			}
! 		}
! 	}
! 
! 	/* In addition, try to join grouped outer to plain inner one(s).  */
! 	if (grouped && outerrel->gpi != NULL && innerrel->gpi == NULL)
! 	{
! 		foreach(lc1, outerrel->gpi->partial_pathlist)
! 		{
! 			Path	   *outerpath = (Path *) lfirst(lc1);
! 			List	   *merge_pathkeys;
! 
! 			merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
! 												 outerpath->pathkeys);
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 jointype, extra, false,
! 									 inner_cheapest_total, merge_pathkeys,
! 									 true, true, false, false);
! 		}
  	}
  }
  
*************** consider_parallel_nestloop(PlannerInfo *
*** 1499,1513 ****
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	ListCell   *lc1;
  
  	if (jointype == JOIN_UNIQUE_INNER)
  		jointype = JOIN_INNER;
  
! 	foreach(lc1, outerrel->partial_pathlist)
  	{
  		Path	   *outerpath = (Path *) lfirst(lc1);
  		List	   *pathkeys;
--- 2329,2373 ----
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped, bool do_aggregate)
  {
  	JoinType	save_jointype = jointype;
+ 	List		*outer_pathlist;
  	ListCell   *lc1;
  
+ 	if (grouped)
+ 	{
+ 		/* TODO Consider if these types should be supported. */
+ 		if (save_jointype == JOIN_UNIQUE_OUTER ||
+ 			save_jointype == JOIN_UNIQUE_INNER)
+ 			return;
+ 	}
+ 
  	if (jointype == JOIN_UNIQUE_INNER)
  		jointype = JOIN_INNER;
  
! 	if (!grouped || do_aggregate)
! 	{
! 		/*
! 		 * If creating grouped paths by explicit aggregation, the input paths
! 		 * must be plain.
! 		 */
! 		outer_pathlist = outerrel->partial_pathlist;
! 	}
! 	else if (outerrel->gpi != NULL)
! 	{
! 		/*
! 		 * Only the outer paths are accepted as grouped when we try to combine
! 		 * grouped and plain ones. Grouped inner path implies repeated
! 		 * aggregation, which doesn't sound as a good idea.
! 		 */
! 		outer_pathlist = outerrel->gpi->partial_pathlist;
! 	}
! 	else
! 		return;
! 
! 	foreach(lc1, outer_pathlist)
  	{
  		Path	   *outerpath = (Path *) lfirst(lc1);
  		List	   *pathkeys;
*************** consider_parallel_nestloop(PlannerInfo *
*** 1538,1544 ****
  			 * inner paths, but right now create_unique_path is not on board
  			 * with that.)
  			 */
! 			if (save_jointype == JOIN_UNIQUE_INNER)
  			{
  				if (innerpath != innerrel->cheapest_total_path)
  					continue;
--- 2398,2404 ----
  			 * inner paths, but right now create_unique_path is not on board
  			 * with that.)
  			 */
! 			if (save_jointype == JOIN_UNIQUE_INNER && !grouped)
  			{
  				if (innerpath != innerrel->cheapest_total_path)
  					continue;
*************** consider_parallel_nestloop(PlannerInfo *
*** 1548,1555 ****
  				Assert(innerpath);
  			}
  
! 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
! 									  pathkeys, jointype, extra);
  		}
  	}
  }
--- 2408,2433 ----
  				Assert(innerpath);
  			}
  
! 			if (!grouped)
! 				try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
! 										  pathkeys, jointype, extra,
! 										  false, false);
! 			else if (do_aggregate)
! 			{
! 				/* Request aggregation as both input rels are plain. */
! 				try_grouped_nestloop_path(root, joinrel, outerpath, innerpath,
! 										  pathkeys, jointype, extra,
! 										  true, true);
! 			}
! 			/*
! 			 * Only combine the grouped outer path with the plain inner if the
! 			 * inner relation cannot produce grouped paths. Otherwise we could
! 			 * generate grouped paths with different targets.
! 			 */
! 			else if (innerrel->gpi == NULL)
! 				try_grouped_nestloop_path(root, joinrel, outerpath, innerpath,
! 										  pathkeys, jointype, extra,
! 										  false, true);
  		}
  	}
  }
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1571,1583 ****
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	bool		isouterjoin = IS_OUTER_JOIN(jointype);
  	List	   *hashclauses;
  	ListCell   *l;
  
  	/*
  	 * We need to build only one hashclauses list for any given pair of outer
  	 * and inner relations; all of the hashable clauses will be used as keys.
--- 2449,2466 ----
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra,
! 					 bool grouped)
  {
  	JoinType	save_jointype = jointype;
  	bool		isouterjoin = IS_OUTER_JOIN(jointype);
  	List	   *hashclauses;
  	ListCell   *l;
  
+ 	/* No grouped join w/o grouped target. */
+ 	if (grouped && joinrel->gpi == NULL)
+ 		return;
+ 
  	/*
  	 * We need to build only one hashclauses list for any given pair of outer
  	 * and inner relations; all of the hashable clauses will be used as keys.
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1627,1632 ****
--- 2510,2518 ----
  		 * can't use a hashjoin.  (There's no use looking for alternative
  		 * input paths, since these should already be the least-parameterized
  		 * available paths.)
+ 		 *
+ 		 * (The same check should work for grouped paths, as these don't
+ 		 * differ in parameterization.)
  		 */
  		if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) ||
  			PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1646,1652 ****
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra);
  			/* no possibility of cheap startup here */
  		}
  		else if (jointype == JOIN_UNIQUE_INNER)
--- 2532,2539 ----
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra,
! 							  false, false);
  			/* no possibility of cheap startup here */
  		}
  		else if (jointype == JOIN_UNIQUE_INNER)
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1662,1668 ****
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra);
  			if (cheapest_startup_outer != NULL &&
  				cheapest_startup_outer != cheapest_total_outer)
  				try_hashjoin_path(root,
--- 2549,2556 ----
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra,
! 							  false, false);
  			if (cheapest_startup_outer != NULL &&
  				cheapest_startup_outer != cheapest_total_outer)
  				try_hashjoin_path(root,
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1671,1733 ****
  								  cheapest_total_inner,
  								  hashclauses,
  								  jointype,
! 								  extra);
  		}
  		else
  		{
! 			/*
! 			 * For other jointypes, we consider the cheapest startup outer
! 			 * together with the cheapest total inner, and then consider
! 			 * pairings of cheapest-total paths including parameterized ones.
! 			 * There is no use in generating parameterized paths on the basis
! 			 * of possibly cheap startup cost, so this is sufficient.
! 			 */
! 			ListCell   *lc1;
! 			ListCell   *lc2;
! 
! 			if (cheapest_startup_outer != NULL)
! 				try_hashjoin_path(root,
! 								  joinrel,
! 								  cheapest_startup_outer,
! 								  cheapest_total_inner,
! 								  hashclauses,
! 								  jointype,
! 								  extra);
! 
! 			foreach(lc1, outerrel->cheapest_parameterized_paths)
  			{
- 				Path	   *outerpath = (Path *) lfirst(lc1);
- 
  				/*
! 				 * We cannot use an outer path that is parameterized by the
! 				 * inner rel.
  				 */
! 				if (PATH_PARAM_BY_REL(outerpath, innerrel))
! 					continue;
  
! 				foreach(lc2, innerrel->cheapest_parameterized_paths)
  				{
! 					Path	   *innerpath = (Path *) lfirst(lc2);
  
  					/*
! 					 * We cannot use an inner path that is parameterized by
! 					 * the outer rel, either.
  					 */
! 					if (PATH_PARAM_BY_REL(innerpath, outerrel))
  						continue;
  
! 					if (outerpath == cheapest_startup_outer &&
! 						innerpath == cheapest_total_inner)
! 						continue;		/* already tried it */
  
! 					try_hashjoin_path(root,
! 									  joinrel,
! 									  outerpath,
! 									  innerpath,
! 									  hashclauses,
! 									  jointype,
! 									  extra);
  				}
  			}
  		}
  
--- 2559,2712 ----
  								  cheapest_total_inner,
  								  hashclauses,
  								  jointype,
! 								  extra,
! 								  false, false);
  		}
  		else
  		{
! 			if (!grouped)
  			{
  				/*
! 				 * For other jointypes, we consider the cheapest startup outer
! 				 * together with the cheapest total inner, and then consider
! 				 * pairings of cheapest-total paths including parameterized
! 				 * ones.  There is no use in generating parameterized paths on
! 				 * the basis of possibly cheap startup cost, so this is
! 				 * sufficient.
  				 */
! 				ListCell   *lc1;
  
! 				if (cheapest_startup_outer != NULL)
! 					try_hashjoin_path(root,
! 									  joinrel,
! 									  cheapest_startup_outer,
! 									  cheapest_total_inner,
! 									  hashclauses,
! 									  jointype,
! 									  extra,
! 									  false, false);
! 
! 				foreach(lc1, outerrel->cheapest_parameterized_paths)
  				{
! 					Path	   *outerpath = (Path *) lfirst(lc1);
! 					ListCell   *lc2;
  
  					/*
! 					 * We cannot use an outer path that is parameterized by the
! 					 * inner rel.
  					 */
! 					if (PATH_PARAM_BY_REL(outerpath, innerrel))
  						continue;
  
! 					foreach(lc2, innerrel->cheapest_parameterized_paths)
! 					{
! 						Path	   *innerpath = (Path *) lfirst(lc2);
  
! 						/*
! 						 * We cannot use an inner path that is parameterized by
! 						 * the outer rel, either.
! 						 */
! 						if (PATH_PARAM_BY_REL(innerpath, outerrel))
! 							continue;
! 
! 						if (outerpath == cheapest_startup_outer &&
! 							innerpath == cheapest_total_inner)
! 							continue;		/* already tried it */
! 
! 						try_hashjoin_path(root,
! 										  joinrel,
! 										  outerpath,
! 										  innerpath,
! 										  hashclauses,
! 										  jointype,
! 										  extra,
! 										  false, false);
! 					}
! 				}
! 			}
! 			else
! 			{
! 				/* Create grouped paths if possible. */
! 				/*
! 				 * TODO
! 				 *
! 				 * Consider processing JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER
! 				 * join types, ie perform grouping of the inner / outer rel if
! 				 * it's not unique yet and if the grouping is legal.
! 				 */
! 				if (jointype == JOIN_UNIQUE_OUTER ||
! 					jointype == JOIN_UNIQUE_INNER)
! 					return;
! 
! 				/*
! 				 * Join grouped relation to non-grouped one.
! 				 *
! 				 * Do not use plain path of the input rel whose target does
! 				 * have GroupedPahtInfo. For example (assuming that join of
! 				 * two grouped rels is not supported), the only way to
! 				 * evaluate SELECT sum(a.x), sum(b.y) ... is to join "a" and
! 				 * "b" and aggregate the result. Otherwise the path target
! 				 * wouldn't match joinrel->gpi->target. TODO Move this comment
! 				 * elsewhere as it seems common to all join kinds.
! 				 */
! 				/*
! 				 * TODO Allow outer join if the grouped rel is on the
! 				 * non-nullable side.
! 				 */
! 				if (jointype == JOIN_INNER)
! 				{
! 					Path	*grouped_path, *plain_path;
! 
! 					if (outerrel->gpi != NULL &&
! 						outerrel->gpi->pathlist != NIL &&
! 						innerrel->gpi == NULL)
! 					{
! 						grouped_path = (Path *)
! 							linitial(outerrel->gpi->pathlist);
! 						plain_path = cheapest_total_inner;
! 						try_grouped_hashjoin_path(root, joinrel,
! 												  grouped_path, plain_path,
! 												  hashclauses, jointype,
! 												  extra, false, false);
! 					}
! 					else if (innerrel->gpi != NULL &&
! 							 innerrel->gpi->pathlist != NIL &&
! 							 outerrel->gpi == NULL)
! 					{
! 						grouped_path = (Path *)
! 							linitial(innerrel->gpi->pathlist);
! 						plain_path = cheapest_total_outer;
! 						try_grouped_hashjoin_path(root, joinrel, plain_path,
! 												  grouped_path, hashclauses,
! 												  jointype, extra,
! 												  false, false);
! 
! 						if (cheapest_startup_outer != NULL &&
! 							cheapest_startup_outer != cheapest_total_outer)
! 						{
! 							plain_path = cheapest_startup_outer;
! 							try_grouped_hashjoin_path(root, joinrel,
! 													  plain_path,
! 													  grouped_path,
! 													  hashclauses,
! 													  jointype, extra,
! 													  false, false);
! 						}
! 					}
  				}
+ 
+ 				/*
+ 				 * Try to join plain relations and make a grouped rel out of
+ 				 * the join.
+ 				 *
+ 				 * Since aggregation needs the whole relation, we are only
+ 				 * interested in total costs.
+ 				 */
+ 				try_grouped_hashjoin_path(root, joinrel,
+ 										  cheapest_total_outer,
+ 										  cheapest_total_inner,
+ 										  hashclauses,
+ 										  jointype, extra, true, false);
  			}
  		}
  
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1765,1777 ****
  				cheapest_safe_inner =
  					get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
  
! 			if (cheapest_safe_inner != NULL)
! 				try_partial_hashjoin_path(root, joinrel,
! 										  cheapest_partial_outer,
! 										  cheapest_safe_inner,
! 										  hashclauses, jointype, extra);
  		}
  	}
  }
  
  /*
--- 2744,2967 ----
  				cheapest_safe_inner =
  					get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
  
! 			if (!grouped)
! 			{
! 				if (cheapest_safe_inner != NULL)
! 					try_partial_hashjoin_path(root, joinrel,
! 											  cheapest_partial_outer,
! 											  cheapest_safe_inner,
! 											  hashclauses, jointype, extra,
! 											  false, false);
! 			}
! 			else if (joinrel->gpi != NULL)
! 			{
! 				/*
! 				 * Grouped partial path.
! 				 *
! 				 * 1. Apply aggregation to the plain partial join path.
! 				 */
! 				if (cheapest_safe_inner != NULL)
! 					try_grouped_hashjoin_path(root, joinrel,
! 											  cheapest_partial_outer,
! 											  cheapest_safe_inner,
! 											  hashclauses,
! 											  jointype, extra, true, true);
! 
! 				/*
! 				 * 2. Join the cheapest partial grouped outer path (if one
! 				 * exists) to cheapest_safe_inner (there's no reason to look
! 				 * for another inner path than what we used for non-grouped
! 				 * partial join path).
! 				 */
! 				if (outerrel->gpi != NULL &&
! 					outerrel->gpi->partial_pathlist != NIL &&
! 					innerrel->gpi == NULL &&
! 					cheapest_safe_inner != NULL)
! 				{
! 					Path	*outer_path;
! 
! 					outer_path = (Path *)
! 						linitial(outerrel->gpi->partial_pathlist);
! 
! 					try_grouped_hashjoin_path(root, joinrel, outer_path,
! 											  cheapest_safe_inner,
! 											  hashclauses,
! 											  jointype, extra, false, true);
! 				}
! 
! 				/*
! 				 * 3. Join the cheapest_partial_outer path (again, no reason
! 				 * to use different outer path than the one we used for plain
! 				 * partial join) to the cheapest grouped inner path if the
! 				 * latter exists and is parallel-safe.
! 				 */
! 				if (innerrel->gpi != NULL &&
! 					innerrel->gpi->pathlist != NIL &&
! 					outerrel->gpi == NULL)
! 				{
! 					Path	*inner_path;
! 
! 					inner_path = (Path *) linitial(innerrel->gpi->pathlist);
! 
! 					if (inner_path->parallel_safe)
! 						try_grouped_hashjoin_path(root, joinrel,
! 												  cheapest_partial_outer,
! 												  inner_path,
! 												  hashclauses,
! 												  jointype, extra,
! 												  false, true);
! 				}
! 
! 				/*
! 				 * Other combinations seem impossible because: 1. At most 1
! 				 * input relation of the join can be grouped, 2. the inner
! 				 * path must not be partial.
! 				 */
! 			}
! 		}
! 	}
! }
! 
! /*
!  * Do the input paths emit all the aggregates contained in the grouped target
!  * of the join?
!  *
!  * The point is that one input relation might be unable to evaluate some
!  * aggregate(s), so it'll only generate plain paths. It's wrong to combine
!  * such plain paths with grouped ones that the other input rel might be able
!  * to generate because the result would miss the aggregate(s) the first
!  * relation failed to evaluate.
!  *
!  * TODO For better efficiency, consider storing Bitmapset of
!  * GroupedVarInfo.gvid in GroupedPathInfo.
!  */
! static bool
! is_grouped_join_target_complete(PlannerInfo *root, PathTarget *jointarget,
! 								Path *outer_path, Path *inner_path)
! {
! 	RelOptInfo	*outer_rel = outer_path->parent;
! 	RelOptInfo	*inner_rel = inner_path->parent;
! 	ListCell	*l1;
! 
! 	/*
! 	 * Join of two grouped relations is not supported.
! 	 *
! 	 * This actually isn't check of target completeness --- can it be located
! 	 * elsewhere?
! 	 */
! 	if (outer_rel->gpi != NULL && inner_rel->gpi != NULL)
! 		return false;
! 
! 	foreach(l1, jointarget->exprs)
! 	{
! 		Expr	*expr = (Expr *) lfirst(l1);
! 		GroupedVar	*gvar;
! 		GroupedVarInfo	*gvi = NULL;
! 		ListCell	*l2;
! 		bool	found = false;
! 
! 		/* Only interested in aggregates. */
! 		if (!IsA(expr, GroupedVar))
! 			continue;
! 
! 		gvar = castNode(GroupedVar, expr);
! 
! 		/* Find the corresponding GroupedVarInfo. */
! 		foreach(l2, root->grouped_var_list)
! 		{
! 			GroupedVarInfo	*gvi_tmp = castNode(GroupedVarInfo, lfirst(l2));
! 
! 			if (gvi_tmp->gvid == gvar->gvid)
! 			{
! 				gvi = gvi_tmp;
! 				break;
! 			}
! 		}
! 		Assert(gvi != NULL);
! 
! 		/*
! 		 * If any aggregate references both input relations, something went
! 		 * wrong during construction of one of the input targets: one input
! 		 * rel is grouped, but no grouping target should have been created for
! 		 * it if some aggregate required more than that input rel.
! 		 */
! 		Assert(gvi->gv_eval_at == NULL ||
! 			   !(bms_overlap(gvi->gv_eval_at, outer_rel->relids) &&
! 				 bms_overlap(gvi->gv_eval_at, inner_rel->relids)));
! 
! 		/*
! 		 * If the aggregate belongs to the plain relation, it probably
! 		 * means that non-grouping expression made aggregation of that
! 		 * input relation impossible. Since that expression is not
! 		 * necessarily emitted by the current join, aggregation might be
! 		 * possible here. On the other hand, aggregation of a join which
! 		 * already contains a grouped relation does not seem too
! 		 * beneficial.
! 		 *
! 		 * XXX The condition below is also met if the query contains both
! 		 * "star aggregate" and a normal one. Since the earlier can be
! 		 * added to any base relation, and since we don't support join of
! 		 * 2 grouped relations, join of arbitrary 2 relations will always
! 		 * result in a plain relation.
! 		 *
! 		 * XXX If we conclude that aggregation is worth, only consider
! 		 * this test failed if target usable for aggregation cannot be
! 		 * created (i.e. the non-grouping expression is in the output of
! 		 * the current join).
! 		 */
! 		if ((outer_rel->gpi == NULL &&
! 			 bms_overlap(gvi->gv_eval_at, outer_rel->relids))
! 			|| (inner_rel->gpi == NULL &&
! 				bms_overlap(gvi->gv_eval_at, inner_rel->relids)))
! 			return false;
! 
! 		/* Look for the aggregate in the input targets. */
! 		if (outer_rel->gpi != NULL)
! 		{
! 			/* No more than one input path should be grouped. */
! 			Assert(inner_rel->gpi == NULL);
! 
! 			foreach(l2, outer_path->pathtarget->exprs)
! 			{
! 				expr = (Expr *) lfirst(l2);
! 
! 				if (!IsA(expr, GroupedVar))
! 					continue;
! 
! 				gvar = castNode(GroupedVar, expr);
! 				if (gvar->gvid == gvi->gvid)
! 				{
! 					found = true;
! 					break;
! 				}
! 			}
  		}
+ 		else if (!found && inner_rel->gpi != NULL)
+ 		{
+ 			Assert(outer_rel->gpi == NULL);
+ 
+ 			foreach(l2, inner_path->pathtarget->exprs)
+ 			{
+ 				expr = (Expr *) lfirst(l2);
+ 
+ 				if (!IsA(expr, GroupedVar))
+ 					continue;
+ 
+ 				gvar = castNode(GroupedVar, expr);
+ 				if (gvar->gvid == gvi->gvid)
+ 				{
+ 					found = true;
+ 					break;
+ 				}
+ 			}
+ 		}
+ 
+ 		/* Even a single missing aggregate causes the whole test to fail. */
+ 		if (!found)
+ 			return false;
  	}
+ 
+ 	return true;
  }
  
  /*
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
new file mode 100644
index 5a68de3..ea24ed9
*** a/src/backend/optimizer/path/joinrels.c
--- b/src/backend/optimizer/path/joinrels.c
***************
*** 14,23 ****
--- 14,29 ----
   */
  #include "postgres.h"
  
+ #include "miscadmin.h"
+ #include "nodes/relation.h"
+ #include "optimizer/clauses.h"
  #include "optimizer/joininfo.h"
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
+ #include "optimizer/prep.h"
+ #include "optimizer/cost.h"
  #include "utils/memutils.h"
+ #include "utils/lsyscache.h"
  
  
  static void make_rels_by_clause_joins(PlannerInfo *root,
*************** static void make_rels_by_clauseless_join
*** 29,40 ****
  static bool has_join_restriction(PlannerInfo *root, RelOptInfo *rel);
  static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
  static bool is_dummy_rel(RelOptInfo *rel);
- static void mark_dummy_rel(RelOptInfo *rel);
  static bool restriction_is_constant_false(List *restrictlist,
  							  bool only_pushed_down);
  static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
  							RelOptInfo *rel2, RelOptInfo *joinrel,
  							SpecialJoinInfo *sjinfo, List *restrictlist);
  
  
  /*
--- 35,53 ----
  static bool has_join_restriction(PlannerInfo *root, RelOptInfo *rel);
  static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
  static bool is_dummy_rel(RelOptInfo *rel);
  static bool restriction_is_constant_false(List *restrictlist,
  							  bool only_pushed_down);
  static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
  							RelOptInfo *rel2, RelOptInfo *joinrel,
  							SpecialJoinInfo *sjinfo, List *restrictlist);
+ static void try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1,
+ 						  RelOptInfo *rel2, RelOptInfo *joinrel,
+ 						  SpecialJoinInfo *parent_sjinfo,
+ 						  List *parent_restrictlist);
+ static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel);
+ static void build_joinrel_partition_bounds(RelOptInfo *rel1, RelOptInfo *rel2,
+ 							   RelOptInfo *joinrel, JoinType jointype,
+ 							   List **rel1_parts, List **rel2_parts);
  
  
  /*
*************** make_join_rel(PlannerInfo *root, RelOptI
*** 731,736 ****
--- 744,752 ----
  	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
  								restrictlist);
  
+ 	/* Apply partition-wise join technique, if possible. */
+ 	try_partition_wise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+ 
  	bms_free(joinrelids);
  
  	return joinrel;
*************** is_dummy_rel(RelOptInfo *rel)
*** 1197,1203 ****
   * is that the best solution is to explicitly make the dummy path in the same
   * context the given RelOptInfo is in.
   */
! static void
  mark_dummy_rel(RelOptInfo *rel)
  {
  	MemoryContext oldcontext;
--- 1213,1219 ----
   * is that the best solution is to explicitly make the dummy path in the same
   * context the given RelOptInfo is in.
   */
! void
  mark_dummy_rel(RelOptInfo *rel)
  {
  	MemoryContext oldcontext;
*************** mark_dummy_rel(RelOptInfo *rel)
*** 1217,1223 ****
  	rel->partial_pathlist = NIL;
  
  	/* Set up the dummy path */
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
  
  	/* Set or update cheapest_total_path and related fields */
  	set_cheapest(rel);
--- 1233,1239 ----
  	rel->partial_pathlist = NIL;
  
  	/* Set up the dummy path */
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL), false);
  
  	/* Set or update cheapest_total_path and related fields */
  	set_cheapest(rel);
*************** restriction_is_constant_false(List *rest
*** 1268,1270 ****
--- 1284,1712 ----
  	}
  	return false;
  }
+ 
+ /*
+  * Assess whether join between given two partitioned relations can be broken
+  * down into joins between matching partitions; a technique called
+  * "partition-wise join"
+  *
+  * Partition-wise join is possible when a. Joining relations have same
+  * partitioning scheme b. There exists an equi-join between the partition keys
+  * of the two relations.
+  *
+  * Partition-wise join is planned as follows (details: optimizer/README.)
+  *
+  * 1. Create the RelOptInfos for joins between matching partitions i.e
+  * child-joins and add paths those.
+  *
+  * 2. Add "append" paths to join between parent relations. The second phase is
+  * implemented by generate_partition_wise_join_paths().
+  *
+  * The RelOptInfo, SpecialJoinInfo and restrictlist for each child join are
+  * obtained by translating the respective parent join structures.
+  */
+ static void
+ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
+ 						RelOptInfo *joinrel, SpecialJoinInfo *parent_sjinfo,
+ 						List *parent_restrictlist)
+ {
+ 	int			nparts;
+ 	int			cnt_parts;
+ 	ListCell   *lc1;
+ 	ListCell   *lc2;
+ 	List	   *rel1_parts;
+ 	List	   *rel2_parts;
+ 	bool		is_strict;
+ 
+ 	/* Guard against stack overflow due to overly deep partition hierarchy. */
+ 	check_stack_depth();
+ 
+ 	/* Nothing to do, if the join relation is not partitioned. */
+ 	if (!joinrel->part_scheme)
+ 		return;
+ 
+ 	/*
+ 	 * set_append_rel_pathlist() may not create paths in children of an empty
+ 	 * partitioned table and so we can not add paths to a child-joins when one
+ 	 * of the joining relations is empty. So, deem such a join as
+ 	 * unpartitioned.
+ 	 */
+ 	if (IS_DUMMY_REL(rel1) || IS_DUMMY_REL(rel2))
+ 		return;
+ 
+ 	/*
+ 	 * Since this join relation is partitioned, all the base relations
+ 	 * participating in this join must be partitioned and so are all the
+ 	 * intermediate join relations.
+ 	 */
+ 	Assert(rel1->part_scheme && rel2->part_scheme);
+ 
+ 	/*
+ 	 * Every pair of joining relations we see here should have an equi-join
+ 	 * between partition keys if this join has been deemed as a partitioned
+ 	 * join. See build_joinrel_partition_info() for reasons.
+ 	 */
+ 	Assert(have_partkey_equi_join(rel1, rel2, parent_sjinfo->jointype,
+ 								  parent_restrictlist, &is_strict));
+ 
+ 	/*
+ 	 * The partition scheme of the join relation should match that of the
+ 	 * joining relations.
+ 	 */
+ 	Assert(joinrel->part_scheme == rel1->part_scheme &&
+ 		   joinrel->part_scheme == rel2->part_scheme);
+ 
+ 	/* We should have RelOptInfos of the partitions available. */
+ 	Assert(rel1->part_rels && rel2->part_rels);
+ 
+ 	/*
+ 	 * Calculate bounds for the join relation. If we can not come up with joint
+ 	 * bounds, we can not use partition-wise join.
+ 	 */
+ 	build_joinrel_partition_bounds(rel1, rel2, joinrel,
+ 								   parent_sjinfo->jointype, &rel1_parts,
+ 								   &rel2_parts);
+ 	if (!joinrel->boundinfo)
+ 		return;
+ 
+ 	Assert(list_length(rel1_parts) == list_length(rel2_parts));
+ 	Assert(joinrel->nparts == list_length(rel1_parts));
+ 	Assert(joinrel->nparts > 0);
+ 
+ 	nparts = joinrel->nparts;
+ 
+ 	elog(DEBUG3, "join between relations %s and %s is considered for partition-wise join.",
+ 		 bmsToString(rel1->relids), bmsToString(rel2->relids));
+ 
+ 	/* Allocate space for hold child-joins RelOptInfos, if not already done. */
+ 	if (!joinrel->part_rels)
+ 		joinrel->part_rels = (RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
+ 
+ 	/*
+ 	 * Create child join relations for this partitioned join, if those don't
+ 	 * exist. Add paths to child-joins for a pair of child relations
+ 	 * corresponding corresponding to the given pair of parent relations.
+ 	 */
+ 	cnt_parts = 0;
+ 	forboth (lc1, rel1_parts, lc2, rel2_parts)
+ 	{
+ 		RelOptInfo *child_rel1 = lfirst(lc1);
+ 		RelOptInfo *child_rel2 = lfirst(lc2);
+ 		SpecialJoinInfo	*child_sjinfo;
+ 		List   *child_restrictlist;
+ 		RelOptInfo *child_joinrel;
+ 		Relids	child_joinrelids;
+ 		AppendRelInfo **appinfos;
+ 		int		nappinfos;
+ 
+ 		/* We should never try to join two overlapping sets of rels. */
+ 		Assert(!bms_overlap(child_rel1->relids, child_rel2->relids));
+ 		child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids);
+ 		appinfos = find_appinfos_by_relids(root, child_joinrelids, &nappinfos);
+ 
+ 		/*
+ 		 * Construct SpecialJoinInfo from parent join relations's
+ 		 * SpecialJoinInfo.
+ 		 */
+ 		child_sjinfo = build_child_join_sjinfo(root, parent_sjinfo,
+ 											   child_rel1->relids,
+ 											   child_rel2->relids);
+ 
+ 		/*
+ 		 * Construct restrictions applicable to the child join from
+ 		 * those applicable to the parent join.
+ 		 */
+ 		child_restrictlist = (List *) adjust_appendrel_attrs(root,
+ 												  (Node *) parent_restrictlist,
+ 														  nappinfos, appinfos);
+ 
+ 		child_joinrel = joinrel->part_rels[cnt_parts];
+ 		if (!child_joinrel)
+ 		{
+ 			child_joinrel = build_child_join_rel(root, child_rel1, child_rel2,
+ 												 joinrel, child_restrictlist,
+ 												 child_sjinfo,
+ 												 child_sjinfo->jointype);
+ 			joinrel->part_rels[cnt_parts] = child_joinrel;
+ 		}
+ 
+ 		Assert(bms_equal(child_joinrel->relids, child_joinrelids));
+ 
+ 		/* Also translate expressions that AggPath will use in its target. */
+ 		if (child_joinrel->gpi != NULL)
+ 		{
+ 			Assert(child_joinrel->gpi->target != NULL);
+ 
+ 			child_joinrel->gpi->target->exprs =
+ 				(List *) adjust_appendrel_attrs(root,
+ 												(Node *) child_joinrel->gpi->target->exprs,
+ 												nappinfos, appinfos);
+ 		}
+ 
+ 		populate_joinrel_with_paths(root, child_rel1, child_rel2,
+ 									child_joinrel, child_sjinfo,
+ 									child_restrictlist);
+ 
+ 		pfree(appinfos);
+ 
+ 		/*
+ 		 * If the child relations themselves are partitioned, try partition-wise join
+ 		 * recursively.
+ 		 */
+ 		try_partition_wise_join(root, child_rel1, child_rel2, child_joinrel,
+ 								child_sjinfo, child_restrictlist);
+ 		cnt_parts++;
+ 	}
+ }
+ 
+ /*
+  * Returns true if there exists an equi-join condition for each pair of
+  * partition key from given relations being joined.
+  */
+ bool
+ have_partkey_equi_join(RelOptInfo *rel1, RelOptInfo *rel2, JoinType jointype,
+ 					   List *restrictlist, bool *is_strict)
+ {
+ 	PartitionScheme	part_scheme = rel1->part_scheme;
+ 	ListCell	*lc;
+ 	int		cnt_pks;
+ 	int		num_pks;
+ 	bool   *pk_has_clause;
+ 
+ 	*is_strict = false;
+ 
+ 	/*
+ 	 * This function should be called when the joining relations have same
+ 	 * partitioning scheme.
+ 	 */
+ 	Assert(rel1->part_scheme == rel2->part_scheme);
+ 	Assert(part_scheme);
+ 
+ 	num_pks = part_scheme->partnatts;
+ 
+ 	pk_has_clause = (bool *) palloc0(sizeof(bool) * num_pks);
+ 
+ 	foreach (lc, restrictlist)
+ 	{
+ 		RestrictInfo *rinfo = lfirst(lc);
+ 		OpExpr		 *opexpr;
+ 		Expr		 *expr1;
+ 		Expr		 *expr2;
+ 		int		ipk1;
+ 		int		ipk2;
+ 
+ 		/* If processing an outer join, only use its own join clauses. */
+ 		if (IS_OUTER_JOIN(jointype) && rinfo->is_pushed_down)
+ 			continue;
+ 
+ 		/* Skip clauses which can not be used for a join. */
+ 		if (!rinfo->can_join)
+ 			continue;
+ 
+ 		/* Skip clauses which are not equality conditions. */
+ 		if (!rinfo->mergeopfamilies)
+ 			continue;
+ 
+ 		opexpr = (OpExpr *) rinfo->clause;
+ 		Assert(is_opclause(opexpr));
+ 
+ 		/*
+ 		 * The equi-join between partition keys is strict if equi-join between
+ 		 * at least one partition key is using a strict operator. See
+ 		 * explanation about outer join reordering identity 3 in
+ 		 * optimizer/README
+ 		 */
+ 		*is_strict = *is_strict || op_strict(opexpr->opno);
+ 
+ 		/* Match the operands to the relation. */
+ 		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
+ 			bms_is_subset(rinfo->right_relids, rel2->relids))
+ 		{
+ 			expr1 = linitial(opexpr->args);
+ 			expr2 = lsecond(opexpr->args);
+ 		}
+ 		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
+ 				 bms_is_subset(rinfo->right_relids, rel1->relids))
+ 		{
+ 			expr1 = lsecond(opexpr->args);
+ 			expr2 = linitial(opexpr->args);
+ 		}
+ 		else
+ 			continue;
+ 
+ 		/*
+ 		 * Only clauses referencing the partition keys are useful for
+ 		 * partition-wise join.
+ 		 */
+ 		ipk1 = match_expr_to_partition_keys(expr1, rel1);
+ 		if (ipk1 < 0)
+ 			continue;
+ 		ipk2 = match_expr_to_partition_keys(expr2, rel2);
+ 		if (ipk2 < 0)
+ 			continue;
+ 
+ 		/*
+ 		 * If the clause refers to keys at different cardinal positions in the
+ 		 * partition keys of joining relations, it can not be used for
+ 		 * partition-wise join.
+ 		 */
+ 		if (ipk1 != ipk2)
+ 			continue;
+ 
+ 		/*
+ 		 * The clause allows partition-wise join if only it uses the same
+ 		 * operator family as that specified by the partition key.
+ 		 */
+ 		if (!list_member_oid(rinfo->mergeopfamilies,
+ 							 part_scheme->partopfamily[ipk1]))
+ 			continue;
+ 
+ 		/* Mark the partition key as having an equi-join clause. */
+ 		pk_has_clause[ipk1] = true;
+ 	}
+ 
+ 	/* Check whether every partition key has an equi-join condition. */
+ 	for (cnt_pks = 0; cnt_pks < num_pks; cnt_pks++)
+ 	{
+ 		if (!pk_has_clause[cnt_pks])
+ 		{
+ 			pfree(pk_has_clause);
+ 			return false;
+ 		}
+ 	}
+ 
+ 	pfree(pk_has_clause);
+ 	return true;
+ }
+ 
+ /*
+  * Find the partition key from the given relation matching the given
+  * expression. If found, return the index of the partition key, else return -1.
+  */
+ static int
+ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel)
+ {
+ 	int		cnt_pks;
+ 	int		num_pks;
+ 
+ 	/* This function should be called only for partitioned relations. */
+ 	Assert(rel->part_scheme);
+ 
+ 	num_pks = rel->part_scheme->partnatts;
+ 
+ 	/* Remove the relabel decoration. */
+ 	while (IsA(expr, RelabelType))
+ 		expr = (Expr *) (castNode(RelabelType, expr))->arg;
+ 
+ 	for (cnt_pks = 0; cnt_pks < num_pks; cnt_pks++)
+ 	{
+ 		List	 *pkexprs = rel->partexprs[cnt_pks];
+ 		ListCell *lc;
+ 
+ 		foreach(lc, pkexprs)
+ 		{
+ 			Expr *pkexpr = lfirst(lc);
+ 			if (equal(pkexpr, expr))
+ 				return cnt_pks;
+ 		}
+ 	}
+ 
+ 	return -1;
+ }
+ 
+ /*
+  * Calculate the bounds/lists of the join relation based on partition bounds of the
+  * joining relations. Also returns the matching partitions from the joining
+  * relations.
+  *
+  * As of now, it simply checks whether the bounds/lists of the joining
+  * relations match and returns bounds/lists of the first relation. In future
+  * this function will be expanded to merge the bounds/lists from the joining
+  * relations to produce the bounds/lists of the join relation. If the function
+  * fails to merge the bounds/lists, it returns NULL and the lists are also NIL.
+  *
+  * The function also returns two lists of RelOptInfos, one for each joining
+  * relation. The RelOptInfos at the same position in each of the lists give the
+  * partitions with matching bounds which can be joined to produce join relation
+  * corresponding to the merged partition bounds corresponding to that position.
+  * When there doesn't exist a matching partition on either side, corresponding
+  * RelOptInfo will be NULL.
+  */
+ static void
+ build_joinrel_partition_bounds(RelOptInfo *rel1, RelOptInfo *rel2,
+ 							   RelOptInfo *joinrel, JoinType jointype,
+ 							   List **rel1_parts, List **rel2_parts)
+ {
+ 	PartitionScheme	part_scheme;
+ 	int			cnt;
+ 	int			nparts;
+ 	int16	   *parttyplen;
+ 	bool	   *parttypbyval;
+ 
+ 	Assert(rel1->part_scheme == rel2->part_scheme);
+ 	Assert(rel1->nparts == rel2->nparts);
+ 	*rel1_parts = NIL;
+ 	*rel2_parts = NIL;
+ 
+ 	part_scheme = rel1->part_scheme;
+ 
+ 	/*
+ 	 * Ideally, we should be able to join two relations which have different
+ 	 * number of partitions as long as the bounds of partitions available on
+ 	 * both the sides match. But for now, we need exact same number of
+ 	 * partitions on both the sides.
+ 	 */
+ 	if (rel1->nparts != rel2->nparts)
+ 	{
+ 		/*
+ 		 * If this pair of joining relations did not have same number of
+ 		 * partitions no other pair can have same number of partitions.
+ 		 */
+ 		Assert(!joinrel->boundinfo && joinrel->nparts == 0);
+ 		return;
+ 	}
+ 
+ 
+ 	parttyplen = (int16 *) palloc(sizeof(int16) * part_scheme->partnatts);
+ 	parttypbyval = (bool *) palloc(sizeof(bool) * part_scheme->partnatts);
+ 	for (cnt = 0; cnt < part_scheme->partnatts; cnt++)
+ 		get_typlenbyval(part_scheme->partopcintype[cnt], &parttyplen[cnt],
+ 						&parttypbyval[cnt]);
+ 
+ 	if (!partition_bounds_equal(part_scheme->partnatts, parttyplen,
+ 								parttypbyval, rel1->boundinfo,
+ 								rel2->boundinfo))
+ 	{
+ 		/*
+ 		 * If this pair of joining relations did not have same partition bounds
+ 		 * no other pair can have same partition bounds.
+ 		 */
+ 		Assert(!joinrel->boundinfo && joinrel->nparts == 0);
+ 		return;
+ 	}
+ 
+ 	nparts = rel1->nparts;
+ 	for (cnt = 0; cnt < nparts; cnt++)
+ 	{
+ 		*rel1_parts = lappend(*rel1_parts, rel1->part_rels[cnt]);
+ 		*rel2_parts = lappend(*rel2_parts, rel2->part_rels[cnt]);
+ 	}
+ 
+ 	/* Set the partition bounds if not already set. */
+ 	if (!joinrel->boundinfo)
+ 	{
+ 		joinrel->boundinfo = rel1->boundinfo;
+ 		joinrel->nparts = rel1->nparts;
+ 	}
+ 	else
+ 	{
+ 		/* Verify existing bounds. */
+ 		Assert(partition_bounds_equal(part_scheme->partnatts, parttyplen,
+ 									  parttypbyval, joinrel->boundinfo,
+ 									  rel1->boundinfo));
+ 		Assert(joinrel->nparts == rel1->nparts);
+ 	}
+ 
+ 	pfree(parttyplen);
+ 	pfree(parttypbyval);
+ }
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
new file mode 100644
index a2fe661..91d855c
*** a/src/backend/optimizer/path/tidpath.c
--- b/src/backend/optimizer/path/tidpath.c
*************** create_tidscan_paths(PlannerInfo *root,
*** 266,270 ****
  
  	if (tidquals)
  		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
! 												   required_outer));
  }
--- 266,270 ----
  
  	if (tidquals)
  		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
! 												   required_outer), false);
  }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
new file mode 100644
index 95e6eb7..3f1389f
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*************** static Plan *prepare_sort_from_pathkeys(
*** 252,258 ****
  static EquivalenceMember *find_ec_member_for_tle(EquivalenceClass *ec,
  					   TargetEntry *tle,
  					   Relids relids);
! static Sort *make_sort_from_pathkeys(Plan *lefttree, List *pathkeys);
  static Sort *make_sort_from_groupcols(List *groupcls,
  						 AttrNumber *grpColIdx,
  						 Plan *lefttree);
--- 252,259 ----
  static EquivalenceMember *find_ec_member_for_tle(EquivalenceClass *ec,
  					   TargetEntry *tle,
  					   Relids relids);
! static Sort *make_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
! 									 Relids relids);
  static Sort *make_sort_from_groupcols(List *groupcls,
  						 AttrNumber *grpColIdx,
  						 Plan *lefttree);
*************** create_sort_plan(PlannerInfo *root, Sort
*** 1650,1656 ****
  	subplan = create_plan_recurse(root, best_path->subpath,
  								  flags | CP_SMALL_TLIST);
  
! 	plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys);
  
  	copy_generic_path_info(&plan->plan, (Path *) best_path);
  
--- 1651,1657 ----
  	subplan = create_plan_recurse(root, best_path->subpath,
  								  flags | CP_SMALL_TLIST);
  
! 	plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys, NULL);
  
  	copy_generic_path_info(&plan->plan, (Path *) best_path);
  
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3767,3772 ****
--- 3768,3775 ----
  	ListCell   *lc;
  	ListCell   *lop;
  	ListCell   *lip;
+ 	Path	   *outer_path = best_path->jpath.outerjoinpath;
+ 	Path	   *inner_path = best_path->jpath.innerjoinpath;
  
  	/*
  	 * MergeJoin can project, so we don't have to demand exact tlists from the
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3830,3837 ****
  	 */
  	if (best_path->outersortkeys)
  	{
  		Sort	   *sort = make_sort_from_pathkeys(outer_plan,
! 												   best_path->outersortkeys);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		outer_plan = (Plan *) sort;
--- 3833,3842 ----
  	 */
  	if (best_path->outersortkeys)
  	{
+ 		Relids		outer_relids = outer_path->parent->relids;
  		Sort	   *sort = make_sort_from_pathkeys(outer_plan,
! 												   best_path->outersortkeys,
! 												   outer_relids);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		outer_plan = (Plan *) sort;
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3842,3849 ****
  
  	if (best_path->innersortkeys)
  	{
  		Sort	   *sort = make_sort_from_pathkeys(inner_plan,
! 												   best_path->innersortkeys);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		inner_plan = (Plan *) sort;
--- 3847,3856 ----
  
  	if (best_path->innersortkeys)
  	{
+ 		Relids		inner_relids = inner_path->parent->relids;
  		Sort	   *sort = make_sort_from_pathkeys(inner_plan,
! 												   best_path->innersortkeys,
! 												   inner_relids);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		inner_plan = (Plan *) sort;
*************** prepare_sort_from_pathkeys(Plan *lefttre
*** 5687,5697 ****
  					continue;
  
  				/*
! 				 * Ignore child members unless they match the rel being
  				 * sorted.
  				 */
  				if (em->em_is_child &&
! 					!bms_equal(em->em_relids, relids))
  					continue;
  
  				sortexpr = em->em_expr;
--- 5694,5704 ----
  					continue;
  
  				/*
! 				 * Ignore child members unless they belong to the rel being
  				 * sorted.
  				 */
  				if (em->em_is_child &&
! 					!bms_is_subset(em->em_relids, relids))
  					continue;
  
  				sortexpr = em->em_expr;
*************** find_ec_member_for_tle(EquivalenceClass
*** 5803,5812 ****
  			continue;
  
  		/*
! 		 * Ignore child members unless they match the rel being sorted.
  		 */
  		if (em->em_is_child &&
! 			!bms_equal(em->em_relids, relids))
  			continue;
  
  		/* Match if same expression (after stripping relabel) */
--- 5810,5819 ----
  			continue;
  
  		/*
! 		 * Ignore child members unless they belong to the rel being sorted.
  		 */
  		if (em->em_is_child &&
! 			!bms_is_subset(em->em_relids, relids))
  			continue;
  
  		/* Match if same expression (after stripping relabel) */
*************** find_ec_member_for_tle(EquivalenceClass
*** 5827,5835 ****
   *
   *	  'lefttree' is the node which yields input tuples
   *	  'pathkeys' is the list of pathkeys by which the result is to be sorted
   */
  static Sort *
! make_sort_from_pathkeys(Plan *lefttree, List *pathkeys)
  {
  	int			numsortkeys;
  	AttrNumber *sortColIdx;
--- 5834,5843 ----
   *
   *	  'lefttree' is the node which yields input tuples
   *	  'pathkeys' is the list of pathkeys by which the result is to be sorted
+  *	  'relids' is the set of relations required by prepare_sort_from_pathkeys()
   */
  static Sort *
! make_sort_from_pathkeys(Plan *lefttree, List *pathkeys, Relids relids)
  {
  	int			numsortkeys;
  	AttrNumber *sortColIdx;
*************** make_sort_from_pathkeys(Plan *lefttree,
*** 5839,5845 ****
  
  	/* Compute sort column info, and adjust lefttree as needed */
  	lefttree = prepare_sort_from_pathkeys(lefttree, pathkeys,
! 										  NULL,
  										  NULL,
  										  false,
  										  &numsortkeys,
--- 5847,5853 ----
  
  	/* Compute sort column info, and adjust lefttree as needed */
  	lefttree = prepare_sort_from_pathkeys(lefttree, pathkeys,
! 										  relids,
  										  NULL,
  										  false,
  										  &numsortkeys,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
new file mode 100644
index ebd442a..0313c71
*** a/src/backend/optimizer/plan/initsplan.c
--- b/src/backend/optimizer/plan/initsplan.c
***************
*** 14,20 ****
--- 14,22 ----
   */
  #include "postgres.h"
  
+ #include "access/sysattr.h"
  #include "catalog/pg_type.h"
+ #include "catalog/pg_class.h"
  #include "nodes/nodeFuncs.h"
  #include "optimizer/clauses.h"
  #include "optimizer/cost.h"
***************
*** 26,31 ****
--- 28,34 ----
  #include "optimizer/planner.h"
  #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
+ #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/analyze.h"
  #include "rewrite/rewriteManip.h"
*************** typedef struct PostponedQual
*** 45,50 ****
--- 48,54 ----
  } PostponedQual;
  
  
+ static void create_grouped_var_infos(PlannerInfo *root);
  static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
  						   Index rtindex);
  static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
*************** add_vars_to_targetlist(PlannerInfo *root
*** 240,245 ****
--- 244,533 ----
  	}
  }
  
+ /*
+  * Add GroupedVarInfo to grouped_var_list for each aggregate and setup
+  * GroupedPathInfo for each base relation that can product grouped paths.
+  *
+  * XXX In the future we might want to create GroupedVarInfo for grouping
+  * expressions too, so that grouping key is not limited to plain Var if the
+  * grouping takes place below the top-level join.
+  *
+  * root->group_pathkeys must be setup before this function is called.
+  */
+ extern void
+ add_grouping_info_to_base_rels(PlannerInfo *root)
+ {
+ 	int			i;
+ 
+ 	/* No grouping in the query? */
+ 	if (!root->parse->groupClause || root->group_pathkeys == NIL)
+ 		return;
+ 
+ 	/* TODO This is just for PoC. Relax the limitation later. */
+ 	if (root->parse->havingQual)
+ 		return;
+ 
+ 	/* Create GroupedVarInfo per (distinct) aggregate. */
+ 	create_grouped_var_infos(root);
+ 
+ 	/* Is no grouping is possible below the top-level join? */
+ 	if (root->grouped_var_list == NIL)
+ 		return;
+ 
+ 	/* Process the individual base relations. */
+ 	for (i = 1; i < root->simple_rel_array_size; i++)
+ 	{
+ 		RelOptInfo	*rel = root->simple_rel_array[i];
+ 
+ 		/*
+ 		 * "other rels" will have their targets built later, by translation of
+ 		 * the target of the parent rel - see set_append_rel_size. If we
+ 		 * wanted to prepare the child rels here, we'd need another iteration
+ 		 * of simple_rel_array_size.
+ 		 */
+ 		if (rel != NULL && rel->reloptkind == RELOPT_BASEREL)
+ 			prepare_rel_for_grouping(root, rel);
+ 	}
+ }
+ 
+ /*
+  * Create GroupedVarInfo for each distinct aggregate.
+  *
+  * If any aggregate is not suitable, set root->grouped_var_list to NIL and
+  * return.
+  *
+  * TODO Include aggregates from HAVING clause.
+  */
+ static void
+ create_grouped_var_infos(PlannerInfo *root)
+ {
+ 	List	   *tlist_exprs;
+ 	ListCell	*lc;
+ 
+ 	Assert(root->grouped_var_list == NIL);
+ 
+ 	/*
+ 	 * TODO Check if processed_tlist contains the HAVING aggregates. If not,
+ 	 * get them elsewhere.
+ 	 */
+ 	tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ 								  PVC_INCLUDE_AGGREGATES);
+ 	if (tlist_exprs == NIL)
+ 		return;
+ 
+ 	/* tlist_exprs may also contain Vars, but we only need Aggrefs. */
+ 	foreach(lc, tlist_exprs)
+ 	{
+ 		Expr	*expr = (Expr *) lfirst(lc);
+ 		Aggref	*aggref;
+ 		ListCell	*lc2;
+ 		GroupedVarInfo	*gvi;
+ 		bool	exists;
+ 
+ 		if (IsA(expr, Var))
+ 			continue;
+ 
+ 		aggref = castNode(Aggref, expr);
+ 
+ 		/* TODO Think if (some of) these can be handled. */
+ 		if (aggref->aggvariadic ||
+ 			aggref->aggdirectargs || aggref->aggorder ||
+ 			aggref->aggdistinct || aggref->aggfilter)
+ 		{
+ 			/*
+ 			 * Partial aggregation is not useful if at least one aggregate
+ 			 * cannot be evaluated below the top-level join.
+ 			 *
+ 			 * XXX Is it worth freeing the GroupedVarInfos and their subtrees?
+ 			 */
+ 			root->grouped_var_list = NIL;
+ 			break;
+ 		}
+ 
+ 		/* Does GroupedVarInfo for this aggregate already exist? */
+ 		exists = false;
+ 		foreach(lc2, root->grouped_var_list)
+ 		{
+ 			Expr	*expr = (Expr *) lfirst(lc2);
+ 
+ 			gvi = castNode(GroupedVarInfo, expr);
+ 
+ 			if (equal(expr, gvi->gvexpr))
+ 			{
+ 				exists = true;
+ 				break;
+ 			}
+ 		}
+ 
+ 		/* Construct a new GroupedVarInfo if does not exist yet. */
+ 		if (!exists)
+ 		{
+ 			Relids	relids;
+ 
+ 			/* TODO Initialize gv_width. */
+ 			gvi = makeNode(GroupedVarInfo);
+ 
+ 			gvi->gvid = list_length(root->grouped_var_list);
+ 			gvi->gvexpr = (Expr *) copyObject(aggref);
+ 			gvi->agg_partial = copyObject(aggref);
+ 			mark_partial_aggref(gvi->agg_partial, AGGSPLIT_INITIAL_SERIAL);
+ 
+ 			/* Find out where the aggregate should be evaluated. */
+ 			relids = pull_varnos((Node *) aggref);
+ 			if (!bms_is_empty(relids))
+ 				gvi->gv_eval_at = relids;
+ 			else
+ 			{
+ 				Assert(aggref->aggstar);
+ 				gvi->gv_eval_at = NULL;
+ 			}
+ 
+ 			root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+ 		}
+ 	}
+ 
+ 	list_free(tlist_exprs);
+ }
+ 
+ /*
+  * Check if all the expressions of rel->reltarget can be used as grouping
+  * expressions and create target for grouped paths.
+  *
+  * If we succeed to create the grouping target, also replace rel->reltarget
+  * with a new one that has sortgrouprefs initialized -- this is necessary for
+  * create_agg_plan to match the grouping clauses against the input target
+  * expressions.
+  *
+  * rel_agg_attrs is a set attributes of the relation referenced by aggregate
+  * arguments. These can exist in the (plain) target without being grouping
+  * expressions.
+  *
+  * rel_agg_vars should be passed instead if rel is a join.
+  *
+  * TODO How about PHVs?
+  *
+  * TODO Make sure cost / width of both "result" and "plain" are correct.
+  */
+ PathTarget *
+ create_grouped_target(PlannerInfo *root, RelOptInfo *rel,
+ 					  Relids rel_agg_attrs, List *rel_agg_vars)
+ {
+ 	PathTarget	*result, *plain;
+ 	ListCell	*lc;
+ 
+ 	/* The plan to be returned. */
+ 	result = create_empty_pathtarget();
+ 	/* The one to replace rel->reltarget. */
+ 	plain = create_empty_pathtarget();
+ 
+ 	foreach(lc, rel->reltarget->exprs)
+ 	{
+ 		Expr		*texpr;
+ 		Index		sortgroupref;
+ 		bool		agg_arg_only = false;
+ 
+ 		texpr = (Expr *) lfirst(lc);
+ 
+ 		sortgroupref = get_expr_sortgroupref(root, texpr);
+ 		if (sortgroupref > 0)
+ 		{
+ 			/* It's o.k. to use the target expression for grouping. */
+ 			add_column_to_pathtarget(result, texpr, sortgroupref);
+ 
+ 			/*
+ 			 * As for the plain target, add the original expression but set
+ 			 * sortgroupref in addition.
+ 			 */
+ 			add_column_to_pathtarget(plain, texpr, sortgroupref);
+ 
+ 			/* Process the next expression. */
+ 			continue;
+ 		}
+ 
+ 		/*
+ 		 * It may still be o.k. if the expression is only contained in Aggref
+ 		 * - then it's not expected in the grouped output.
+ 		 *
+ 		 * TODO Try to handle generic expression, not only Var. That might
+ 		 * require us to create rel->reltarget of the grouping rel in
+ 		 * parallel to that of the plain rel, and adding whole expressions
+ 		 * instead of individual vars.
+ 		 */
+ 		if (IsA(texpr, Var))
+ 		{
+ 			Var	*arg_var = castNode(Var, texpr);
+ 
+ 			if (rel->relid > 0)
+ 			{
+ 				AttrNumber	varattno;
+ 
+ 				/*
+ 				 * For a single relation we only need to check attribute
+ 				 * number.
+ 				 *
+ 				 * Apply the same offset that pull_varattnos() did.
+ 				 */
+ 				varattno = arg_var->varattno - FirstLowInvalidHeapAttributeNumber;
+ 
+ 				if (bms_is_member(varattno, rel_agg_attrs))
+ 					agg_arg_only = true;
+ 			}
+ 			else
+ 			{
+ 				ListCell	*lc2;
+ 
+ 				/* Join case. */
+ 				foreach(lc2, rel_agg_vars)
+ 				{
+ 					Var	*var = castNode(Var, lfirst(lc2));
+ 
+ 					if (var->varno == arg_var->varno &&
+ 						var->varattno == arg_var->varattno)
+ 					{
+ 						agg_arg_only = true;
+ 						break;
+ 					}
+ 				}
+ 			}
+ 
+ 			if (agg_arg_only)
+ 			{
+ 				/*
+ 				 * This expression is not suitable for grouping, but the
+ 				 * aggregation input target ought to stay complete.
+ 				 */
+ 				add_column_to_pathtarget(plain, texpr, 0);
+ 			}
+ 		}
+ 
+ 		/*
+ 		 * A single mismatched expression makes the whole relation useless
+ 		 * for grouping.
+ 		 */
+ 		if (!agg_arg_only)
+ 		{
+ 			/*
+ 			 * TODO This seems possible to happen multiple times per relation,
+ 			 * so result might be worth freeing. Implement free_pathtarget()?
+ 			 * Or mark the relation as inappropriate for grouping?
+ 			 */
+ 			/* TODO Free both result and plain. */
+ 			return NULL;
+ 		}
+ 	}
+ 
+ 	if (list_length(result->exprs) == 0)
+ 	{
+ 		/* TODO free_pathtarget(result); free_pathtarget(plain) */
+ 		result = NULL;
+ 	}
+ 
+ 	/* Apply the adjusted input target as the replacement is complete now.q */
+ 	rel->reltarget = plain;
+ 
+ 	return result;
+ }
+ 
  
  /*****************************************************************************
   *
*************** create_lateral_join_info(PlannerInfo *ro
*** 629,639 ****
  	for (rti = 1; rti < root->simple_rel_array_size; rti++)
  	{
  		RelOptInfo *brel = root->simple_rel_array[rti];
  
! 		if (brel == NULL || brel->reloptkind != RELOPT_BASEREL)
  			continue;
  
! 		if (root->simple_rte_array[rti]->inh)
  		{
  			foreach(lc, root->append_rel_list)
  			{
--- 917,941 ----
  	for (rti = 1; rti < root->simple_rel_array_size; rti++)
  	{
  		RelOptInfo *brel = root->simple_rel_array[rti];
+ 		RangeTblEntry *brte = root->simple_rte_array[rti];
  
! 		if (brel == NULL)
  			continue;
  
! 		/*
! 		 * If an "other rel" RTE is a "partitioned table", we must propagate
! 		 * the lateral info inherited all the way from the root parent to its
! 		 * children. That's because the children are not linked directly with
! 		 * the root parent via AppendRelInfo's unlike in case of a regular
! 		 * inheritance set (see expand_inherited_rtentry()).  Failing to
! 		 * do this would result in those children not getting marked with the
! 		 * appropriate lateral info.
! 		 */
! 		if (brel->reloptkind != RELOPT_BASEREL &&
! 			brte->relkind != RELKIND_PARTITIONED_TABLE)
! 			continue;
! 
! 		if (brte->inh)
  		{
  			foreach(lc, root->append_rel_list)
  			{
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
new file mode 100644
index 5565736..058af2c
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** preprocess_minmax_aggregates(PlannerInfo
*** 223,229 ****
  			 create_minmaxagg_path(root, grouped_rel,
  								   create_pathtarget(root, tlist),
  								   aggs_list,
! 								   (List *) parse->havingQual));
  }
  
  /*
--- 223,229 ----
  			 create_minmaxagg_path(root, grouped_rel,
  								   create_pathtarget(root, tlist),
  								   aggs_list,
! 								   (List *) parse->havingQual), false);
  }
  
  /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
new file mode 100644
index ef0de3f..f70b445
*** a/src/backend/optimizer/plan/planmain.c
--- b/src/backend/optimizer/plan/planmain.c
*************** query_planner(PlannerInfo *root, List *t
*** 83,89 ****
  		add_path(final_rel, (Path *)
  				 create_result_path(root, final_rel,
  									final_rel->reltarget,
! 									(List *) parse->jointree->quals));
  
  		/* Select cheapest path (pretty easy in this case...) */
  		set_cheapest(final_rel);
--- 83,89 ----
  		add_path(final_rel, (Path *)
  				 create_result_path(root, final_rel,
  									final_rel->reltarget,
! 									(List *) parse->jointree->quals), false);
  
  		/* Select cheapest path (pretty easy in this case...) */
  		set_cheapest(final_rel);
*************** query_planner(PlannerInfo *root, List *t
*** 114,119 ****
--- 114,120 ----
  	root->full_join_clauses = NIL;
  	root->join_info_list = NIL;
  	root->placeholder_list = NIL;
+ 	root->grouped_var_list = NIL;
  	root->fkey_list = NIL;
  	root->initial_rels = NIL;
  
*************** query_planner(PlannerInfo *root, List *t
*** 177,182 ****
--- 178,191 ----
  	(*qp_callback) (root, qp_extra);
  
  	/*
+ 	 * If the query result can be grouped, check if any grouping can be
+ 	 * performed below the top-level join. If so, Initialize GroupedPathInfo
+ 	 * of base relations capable to do the grouping and setup
+ 	 * root->grouped_var_list.
+ 	 */
+ 	add_grouping_info_to_base_rels(root);
+ 
+ 	/*
  	 * Examine any "placeholder" expressions generated during subquery pullup.
  	 * Make sure that the Vars they need are marked as needed at the relevant
  	 * join level.  This must be done before join removal because it might
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
new file mode 100644
index 649a233..d47f635
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** typedef struct
*** 108,117 ****
--- 108,135 ----
  	int		   *tleref_to_colnum_map;
  } grouping_sets_data;
  
+ /* Result of a given invocation of inheritance_planner_guts() */
+ typedef struct
+ {
+ 	Index 	nominalRelation;
+ 	List   *partitioned_rels;
+ 	List   *resultRelations;
+ 	List   *subpaths;
+ 	List   *subroots;
+ 	List   *withCheckOptionLists;
+ 	List   *returningLists;
+ 	List   *final_rtable;
+ 	List   *init_plans;
+ 	int		save_rel_array_size;
+ 	RelOptInfo **save_rel_array;
+ } inheritance_planner_result;
+ 
  /* Local functions */
  static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
  static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
  static void inheritance_planner(PlannerInfo *root);
+ static void inheritance_planner_guts(PlannerInfo *root,
+ 						 inheritance_planner_result *inhpres);
  static void grouping_planner(PlannerInfo *root, bool inheritance_update,
  				 double tuple_fraction);
  static grouping_sets_data *preprocess_grouping_sets(PlannerInfo *root);
*************** static void standard_qp_callback(Planner
*** 130,138 ****
  static double get_number_of_groups(PlannerInfo *root,
  					 double path_rows,
  					 grouping_sets_data *gd);
- static Size estimate_hashagg_tablesize(Path *path,
- 						   const AggClauseCosts *agg_costs,
- 						   double dNumGroups);
  static RelOptInfo *create_grouping_paths(PlannerInfo *root,
  					  RelOptInfo *input_rel,
  					  PathTarget *target,
--- 148,153 ----
*************** preprocess_phv_expression(PlannerInfo *r
*** 1020,1044 ****
  static void
  inheritance_planner(PlannerInfo *root)
  {
  	Query	   *parse = root->parse;
  	int			parentRTindex = parse->resultRelation;
  	Bitmapset  *subqueryRTindexes;
  	Bitmapset  *modifiableARIindexes;
! 	int			nominalRelation = -1;
! 	List	   *final_rtable = NIL;
! 	int			save_rel_array_size = 0;
! 	RelOptInfo **save_rel_array = NULL;
! 	List	   *subpaths = NIL;
! 	List	   *subroots = NIL;
! 	List	   *resultRelations = NIL;
! 	List	   *withCheckOptionLists = NIL;
! 	List	   *returningLists = NIL;
! 	List	   *rowMarks;
! 	RelOptInfo *final_rel;
  	ListCell   *lc;
  	Index		rti;
  	RangeTblEntry *parent_rte;
- 	List		  *partitioned_rels = NIL;
  
  	Assert(parse->commandType != CMD_INSERT);
  
--- 1035,1139 ----
  static void
  inheritance_planner(PlannerInfo *root)
  {
+ 	inheritance_planner_result inhpres;
+ 	Query	   *parse = root->parse;
+ 	RelOptInfo *final_rel;
+ 	Index		rti;
+ 	int			final_rtable_len;
+ 	ListCell   *lc;
+ 	List	   *rowMarks;
+ 
+ 	/*
+ 	 * Away we go... Although the inheritance hierarchy to be processed might
+ 	 * be represented in a non-flat manner, some of the elements needed to
+ 	 * create the final ModifyTable path are always returned in a flat list
+ 	 * structure.
+ 	 */
+ 	memset(&inhpres, 0, sizeof(inhpres));
+ 	inheritance_planner_guts(root, &inhpres);
+ 
+ 	/* Result path must go into outer query's FINAL upperrel */
+ 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
+ 
+ 	/*
+ 	 * We don't currently worry about setting final_rel's consider_parallel
+ 	 * flag in this case, nor about allowing FDWs or create_upper_paths_hook
+ 	 * to get control here.
+ 	 */
+ 
+ 	/*
+ 	 * If we managed to exclude every child rel, return a dummy plan; it
+ 	 * doesn't even need a ModifyTable node.
+ 	 */
+ 	if (inhpres.subpaths == NIL)
+ 	{
+ 		set_dummy_rel_pathlist(final_rel);
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Put back the final adjusted rtable into the master copy of the Query.
+ 	 * (We mustn't do this if we found no non-excluded children.)
+ 	 */
+ 	parse->rtable = inhpres.final_rtable;
+ 	root->simple_rel_array_size = inhpres.save_rel_array_size;
+ 	root->simple_rel_array = inhpres.save_rel_array;
+ 	/* Must reconstruct master's simple_rte_array, too */
+ 	final_rtable_len = list_length(inhpres.final_rtable);
+ 	root->simple_rte_array = (RangeTblEntry **)
+ 								palloc0((final_rtable_len + 1) *
+ 											sizeof(RangeTblEntry *));
+ 	rti = 1;
+ 	foreach(lc, inhpres.final_rtable)
+ 	{
+ 		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+ 
+ 		root->simple_rte_array[rti++] = rte;
+ 	}
+ 
+ 	/*
+ 	 * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node will
+ 	 * have dealt with fetching non-locked marked rows, else we need to have
+ 	 * ModifyTable do that.
+ 	 */
+ 	if (parse->rowMarks)
+ 		rowMarks = NIL;
+ 	else
+ 		rowMarks = root->rowMarks;
+ 
+ 	/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
+ 	add_path(final_rel, (Path *)
+ 			 create_modifytable_path(root, final_rel,
+ 									 parse->commandType,
+ 									 parse->canSetTag,
+ 									 inhpres.nominalRelation,
+ 									 inhpres.partitioned_rels,
+ 									 inhpres.resultRelations,
+ 									 inhpres.subpaths,
+ 									 inhpres.subroots,
+ 									 inhpres.withCheckOptionLists,
+ 									 inhpres.returningLists,
+ 									 rowMarks,
+ 									 NULL,
+ 									 SS_assign_special_param(root)), false);
+ }
+ 
+ /*
+  * inheritance_planner_guts
+  *	  Recursive guts of inheritance_planner
+  */
+ static void
+ inheritance_planner_guts(PlannerInfo *root,
+ 						 inheritance_planner_result *inhpres)
+ {
  	Query	   *parse = root->parse;
  	int			parentRTindex = parse->resultRelation;
  	Bitmapset  *subqueryRTindexes;
  	Bitmapset  *modifiableARIindexes;
! 	bool		nominalRelationSet = false;
  	ListCell   *lc;
  	Index		rti;
  	RangeTblEntry *parent_rte;
  
  	Assert(parse->commandType != CMD_INSERT);
  
*************** inheritance_planner(PlannerInfo *root)
*** 1106,1112 ****
  	 */
  	parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
! 		nominalRelation = parentRTindex;
  
  	/*
  	 * And now we can get on with generating a plan for each child table.
--- 1201,1210 ----
  	 */
  	parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
! 	{
! 		inhpres->nominalRelation = parentRTindex;
! 		nominalRelationSet = true;
! 	}
  
  	/*
  	 * And now we can get on with generating a plan for each child table.
*************** inheritance_planner(PlannerInfo *root)
*** 1115,1120 ****
--- 1213,1219 ----
  	{
  		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
  		PlannerInfo *subroot;
+ 		Index	childRTindex = appinfo->child_relid;
  		RangeTblEntry *child_rte;
  		RelOptInfo *sub_final_rel;
  		Path	   *subpath;
*************** inheritance_planner(PlannerInfo *root)
*** 1136,1152 ****
  		 * references to the parent RTE to refer to the current child RTE,
  		 * then fool around with subquery RTEs.
  		 */
! 		subroot->parse = (Query *)
! 			adjust_appendrel_attrs(root,
! 								   (Node *) parse,
! 								   appinfo);
  
  		/*
  		 * If there are securityQuals attached to the parent, move them to the
  		 * child rel (they've already been transformed properly for that).
  		 */
  		parent_rte = rt_fetch(parentRTindex, subroot->parse->rtable);
! 		child_rte = rt_fetch(appinfo->child_relid, subroot->parse->rtable);
  		child_rte->securityQuals = parent_rte->securityQuals;
  		parent_rte->securityQuals = NIL;
  
--- 1235,1249 ----
  		 * references to the parent RTE to refer to the current child RTE,
  		 * then fool around with subquery RTEs.
  		 */
! 		subroot->parse = (Query *) adjust_appendrel_attrs(root, (Node *) parse,
! 														  1, &appinfo);
  
  		/*
  		 * If there are securityQuals attached to the parent, move them to the
  		 * child rel (they've already been transformed properly for that).
  		 */
  		parent_rte = rt_fetch(parentRTindex, subroot->parse->rtable);
! 		child_rte = rt_fetch(childRTindex, subroot->parse->rtable);
  		child_rte->securityQuals = parent_rte->securityQuals;
  		parent_rte->securityQuals = NIL;
  
*************** inheritance_planner(PlannerInfo *root)
*** 1191,1197 ****
  		 * These won't be referenced, so there's no need to make them very
  		 * valid-looking.
  		 */
! 		while (list_length(subroot->parse->rtable) < list_length(final_rtable))
  			subroot->parse->rtable = lappend(subroot->parse->rtable,
  											 makeNode(RangeTblEntry));
  
--- 1288,1295 ----
  		 * These won't be referenced, so there's no need to make them very
  		 * valid-looking.
  		 */
! 		while (list_length(subroot->parse->rtable) <
! 										list_length(inhpres->final_rtable))
  			subroot->parse->rtable = lappend(subroot->parse->rtable,
  											 makeNode(RangeTblEntry));
  
*************** inheritance_planner(PlannerInfo *root)
*** 1203,1209 ****
  		 * since subquery RTEs couldn't contain any references to the target
  		 * rel.
  		 */
! 		if (final_rtable != NIL && subqueryRTindexes != NULL)
  		{
  			ListCell   *lr;
  
--- 1301,1307 ----
  		 * since subquery RTEs couldn't contain any references to the target
  		 * rel.
  		 */
! 		if (inhpres->final_rtable != NIL && subqueryRTindexes != NULL)
  		{
  			ListCell   *lr;
  
*************** inheritance_planner(PlannerInfo *root)
*** 1248,1253 ****
--- 1346,1392 ----
  			}
  		}
  
+ 		/*
+ 		 * Recurse for a partitioned child table.  We shouldn't be planning
+ 		 * a partitioned RTE as a child member, which is what the code after
+ 		 * this block does.
+ 		 */
+ 		if (child_rte->inh)
+ 		{
+ 			inheritance_planner_result	child_inhpres;
+ 
+ 			Assert(child_rte->relkind == RELKIND_PARTITIONED_TABLE);
+ 
+ 			/* During the recursive invocation, this child is the parent. */
+ 			subroot->parse->resultRelation = childRTindex;
+ 			memset(&child_inhpres, 0, sizeof(child_inhpres));
+ 			inheritance_planner_guts(subroot, &child_inhpres);
+ 
+ 			inhpres->partitioned_rels = list_concat(inhpres->partitioned_rels,
+ 											child_inhpres.partitioned_rels);
+ 			inhpres->resultRelations = list_concat(inhpres->resultRelations,
+ 											child_inhpres.resultRelations);
+ 			inhpres->subpaths = list_concat(inhpres->subpaths,
+ 											child_inhpres.subpaths);
+ 			inhpres->subroots = list_concat(inhpres->subroots,
+ 											child_inhpres.subroots);
+ 			inhpres->withCheckOptionLists =
+ 									list_concat(inhpres->withCheckOptionLists,
+ 										child_inhpres.withCheckOptionLists);
+ 			inhpres->returningLists = list_concat(inhpres->returningLists,
+ 											child_inhpres.returningLists);
+ 			if (child_inhpres.final_rtable != NIL)
+ 				inhpres->final_rtable = child_inhpres.final_rtable;
+ 			if (child_inhpres.init_plans != NIL)
+ 				inhpres->init_plans = child_inhpres.init_plans;
+ 			if (child_inhpres.save_rel_array_size != 0)
+ 			{
+ 				inhpres->save_rel_array_size = child_inhpres.save_rel_array_size;
+ 				inhpres->save_rel_array = child_inhpres.save_rel_array;
+ 			}
+ 			continue;
+ 		}
+ 
  		/* There shouldn't be any OJ info to translate, as yet */
  		Assert(subroot->join_info_list == NIL);
  		/* and we haven't created PlaceHolderInfos, either */
*************** inheritance_planner(PlannerInfo *root)
*** 1279,1286 ****
  		 * the duplicate child RTE added for the parent does not appear
  		 * anywhere else in the plan tree.
  		 */
! 		if (nominalRelation < 0)
! 			nominalRelation = appinfo->child_relid;
  
  		/*
  		 * Select cheapest path in case there's more than one.  We always run
--- 1418,1428 ----
  		 * the duplicate child RTE added for the parent does not appear
  		 * anywhere else in the plan tree.
  		 */
! 		if (!nominalRelationSet)
! 		{
! 			inhpres->nominalRelation = childRTindex;
! 			nominalRelationSet = true;
! 		}
  
  		/*
  		 * Select cheapest path in case there's more than one.  We always run
*************** inheritance_planner(PlannerInfo *root)
*** 1303,1314 ****
  		 * becomes the initial contents of final_rtable; otherwise, append
  		 * just its modified subquery RTEs to final_rtable.
  		 */
! 		if (final_rtable == NIL)
! 			final_rtable = subroot->parse->rtable;
  		else
! 			final_rtable = list_concat(final_rtable,
! 									   list_copy_tail(subroot->parse->rtable,
! 												 list_length(final_rtable)));
  
  		/*
  		 * We need to collect all the RelOptInfos from all child plans into
--- 1445,1456 ----
  		 * becomes the initial contents of final_rtable; otherwise, append
  		 * just its modified subquery RTEs to final_rtable.
  		 */
! 		if (inhpres->final_rtable == NIL)
! 			inhpres->final_rtable = subroot->parse->rtable;
  		else
! 			inhpres->final_rtable = list_concat(inhpres->final_rtable,
! 										list_copy_tail(subroot->parse->rtable,
! 										 list_length(inhpres->final_rtable)));
  
  		/*
  		 * We need to collect all the RelOptInfos from all child plans into
*************** inheritance_planner(PlannerInfo *root)
*** 1317,1425 ****
  		 * have to propagate forward the RelOptInfos that were already built
  		 * in previous children.
  		 */
! 		Assert(subroot->simple_rel_array_size >= save_rel_array_size);
! 		for (rti = 1; rti < save_rel_array_size; rti++)
  		{
! 			RelOptInfo *brel = save_rel_array[rti];
  
  			if (brel)
  				subroot->simple_rel_array[rti] = brel;
  		}
! 		save_rel_array_size = subroot->simple_rel_array_size;
! 		save_rel_array = subroot->simple_rel_array;
  
  		/* Make sure any initplans from this rel get into the outer list */
! 		root->init_plans = subroot->init_plans;
  
  		/* Build list of sub-paths */
! 		subpaths = lappend(subpaths, subpath);
  
  		/* Build list of modified subroots, too */
! 		subroots = lappend(subroots, subroot);
  
  		/* Build list of target-relation RT indexes */
! 		resultRelations = lappend_int(resultRelations, appinfo->child_relid);
  
  		/* Build lists of per-relation WCO and RETURNING targetlists */
  		if (parse->withCheckOptions)
! 			withCheckOptionLists = lappend(withCheckOptionLists,
! 										   subroot->parse->withCheckOptions);
  		if (parse->returningList)
! 			returningLists = lappend(returningLists,
! 									 subroot->parse->returningList);
! 
  		Assert(!parse->onConflict);
  	}
  
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
  	{
! 		partitioned_rels = get_partitioned_child_rels(root, parentRTindex);
  		/* The root partitioned table is included as a child rel */
! 		Assert(list_length(partitioned_rels) >= 1);
! 	}
! 
! 	/* Result path must go into outer query's FINAL upperrel */
! 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
! 
! 	/*
! 	 * We don't currently worry about setting final_rel's consider_parallel
! 	 * flag in this case, nor about allowing FDWs or create_upper_paths_hook
! 	 * to get control here.
! 	 */
! 
! 	/*
! 	 * If we managed to exclude every child rel, return a dummy plan; it
! 	 * doesn't even need a ModifyTable node.
! 	 */
! 	if (subpaths == NIL)
! 	{
! 		set_dummy_rel_pathlist(final_rel);
! 		return;
! 	}
! 
! 	/*
! 	 * Put back the final adjusted rtable into the master copy of the Query.
! 	 * (We mustn't do this if we found no non-excluded children.)
! 	 */
! 	parse->rtable = final_rtable;
! 	root->simple_rel_array_size = save_rel_array_size;
! 	root->simple_rel_array = save_rel_array;
! 	/* Must reconstruct master's simple_rte_array, too */
! 	root->simple_rte_array = (RangeTblEntry **)
! 		palloc0((list_length(final_rtable) + 1) * sizeof(RangeTblEntry *));
! 	rti = 1;
! 	foreach(lc, final_rtable)
! 	{
! 		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
! 
! 		root->simple_rte_array[rti++] = rte;
  	}
- 
- 	/*
- 	 * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node will
- 	 * have dealt with fetching non-locked marked rows, else we need to have
- 	 * ModifyTable do that.
- 	 */
- 	if (parse->rowMarks)
- 		rowMarks = NIL;
- 	else
- 		rowMarks = root->rowMarks;
- 
- 	/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
- 	add_path(final_rel, (Path *)
- 			 create_modifytable_path(root, final_rel,
- 									 parse->commandType,
- 									 parse->canSetTag,
- 									 nominalRelation,
- 									 partitioned_rels,
- 									 resultRelations,
- 									 subpaths,
- 									 subroots,
- 									 withCheckOptionLists,
- 									 returningLists,
- 									 rowMarks,
- 									 NULL,
- 									 SS_assign_special_param(root)));
  }
  
  /*--------------------
--- 1459,1506 ----
  		 * have to propagate forward the RelOptInfos that were already built
  		 * in previous children.
  		 */
! 		Assert(subroot->simple_rel_array_size >= inhpres->save_rel_array_size);
! 		for (rti = 1; rti < inhpres->save_rel_array_size; rti++)
  		{
! 			RelOptInfo *brel = inhpres->save_rel_array[rti];
  
  			if (brel)
  				subroot->simple_rel_array[rti] = brel;
  		}
! 		inhpres->save_rel_array_size = subroot->simple_rel_array_size;
! 		inhpres->save_rel_array = subroot->simple_rel_array;
  
  		/* Make sure any initplans from this rel get into the outer list */
! 		inhpres->init_plans = subroot->init_plans;
  
  		/* Build list of sub-paths */
! 		inhpres->subpaths = lappend(inhpres->subpaths, subpath);
  
  		/* Build list of modified subroots, too */
! 		inhpres->subroots = lappend(inhpres->subroots, subroot);
  
  		/* Build list of target-relation RT indexes */
! 		inhpres->resultRelations = lappend_int(inhpres->resultRelations,
! 											   childRTindex);
  
  		/* Build lists of per-relation WCO and RETURNING targetlists */
  		if (parse->withCheckOptions)
! 			inhpres->withCheckOptionLists =
! 										lappend(inhpres->withCheckOptionLists,
! 											subroot->parse->withCheckOptions);
  		if (parse->returningList)
! 			inhpres->returningLists = lappend(inhpres->returningLists,
! 											  subroot->parse->returningList);
  		Assert(!parse->onConflict);
  	}
  
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
  	{
! 		inhpres->partitioned_rels = get_partitioned_child_rels(root,
! 															parentRTindex);
  		/* The root partitioned table is included as a child rel */
! 		Assert(list_length(inhpres->partitioned_rels) >= 1);
  	}
  }
  
  /*--------------------
*************** grouping_planner(PlannerInfo *root, bool
*** 2040,2046 ****
  		}
  
  		/* And shove it into final_rel */
! 		add_path(final_rel, path);
  	}
  
  	/*
--- 2121,2127 ----
  		}
  
  		/* And shove it into final_rel */
! 		add_path(final_rel, path, false);
  	}
  
  	/*
*************** get_number_of_groups(PlannerInfo *root,
*** 3446,3485 ****
  }
  
  /*
-  * estimate_hashagg_tablesize
-  *	  estimate the number of bytes that a hash aggregate hashtable will
-  *	  require based on the agg_costs, path width and dNumGroups.
-  *
-  * XXX this may be over-estimating the size now that hashagg knows to omit
-  * unneeded columns from the hashtable. Also for mixed-mode grouping sets,
-  * grouping columns not in the hashed set are counted here even though hashagg
-  * won't store them. Is this a problem?
-  */
- static Size
- estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
- 						   double dNumGroups)
- {
- 	Size		hashentrysize;
- 
- 	/* Estimate per-hash-entry space at tuple width... */
- 	hashentrysize = MAXALIGN(path->pathtarget->width) +
- 		MAXALIGN(SizeofMinimalTupleHeader);
- 
- 	/* plus space for pass-by-ref transition values... */
- 	hashentrysize += agg_costs->transitionSpace;
- 	/* plus the per-hash-entry overhead */
- 	hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
- 
- 	/*
- 	 * Note that this disregards the effect of fill-factor and growth policy
- 	 * of the hash-table. That's probably ok, given default the default
- 	 * fill-factor is relatively high. It'd be hard to meaningfully factor in
- 	 * "double-in-size" growth policies here.
- 	 */
- 	return hashentrysize * dNumGroups;
- }
- 
- /*
   * create_grouping_paths
   *
   * Build a new upperrel containing Paths for grouping and/or aggregation.
--- 3527,3532 ----
*************** create_grouping_paths(PlannerInfo *root,
*** 3600,3606 ****
  								   (List *) parse->havingQual);
  		}
  
! 		add_path(grouped_rel, path);
  
  		/* No need to consider any other alternatives. */
  		set_cheapest(grouped_rel);
--- 3647,3653 ----
  								   (List *) parse->havingQual);
  		}
  
! 		add_path(grouped_rel, path, false);
  
  		/* No need to consider any other alternatives. */
  		set_cheapest(grouped_rel);
*************** create_grouping_paths(PlannerInfo *root,
*** 3777,3783 ****
  														 parse->groupClause,
  														 NIL,
  														 &agg_partial_costs,
! 														 dNumPartialGroups));
  					else
  						add_partial_path(grouped_rel, (Path *)
  										 create_group_path(root,
--- 3824,3831 ----
  														 parse->groupClause,
  														 NIL,
  														 &agg_partial_costs,
! 														 dNumPartialGroups),
! 							false);
  					else
  						add_partial_path(grouped_rel, (Path *)
  										 create_group_path(root,
*************** create_grouping_paths(PlannerInfo *root,
*** 3786,3792 ****
  													 partial_grouping_target,
  														   parse->groupClause,
  														   NIL,
! 														 dNumPartialGroups));
  				}
  			}
  		}
--- 3834,3841 ----
  													 partial_grouping_target,
  														   parse->groupClause,
  														   NIL,
! 														   dNumPartialGroups),
! 										 false);
  				}
  			}
  		}
*************** create_grouping_paths(PlannerInfo *root,
*** 3817,3823 ****
  												 parse->groupClause,
  												 NIL,
  												 &agg_partial_costs,
! 												 dNumPartialGroups));
  			}
  		}
  	}
--- 3866,3873 ----
  												 parse->groupClause,
  												 NIL,
  												 &agg_partial_costs,
! 												 dNumPartialGroups),
! 								 false);
  			}
  		}
  	}
*************** create_grouping_paths(PlannerInfo *root,
*** 3869,3875 ****
  											 parse->groupClause,
  											 (List *) parse->havingQual,
  											 agg_costs,
! 											 dNumGroups));
  				}
  				else if (parse->groupClause)
  				{
--- 3919,3925 ----
  											 parse->groupClause,
  											 (List *) parse->havingQual,
  											 agg_costs,
! 											 dNumGroups), false);
  				}
  				else if (parse->groupClause)
  				{
*************** create_grouping_paths(PlannerInfo *root,
*** 3884,3890 ****
  											   target,
  											   parse->groupClause,
  											   (List *) parse->havingQual,
! 											   dNumGroups));
  				}
  				else
  				{
--- 3934,3940 ----
  											   target,
  											   parse->groupClause,
  											   (List *) parse->havingQual,
! 											   dNumGroups), false);
  				}
  				else
  				{
*************** create_grouping_paths(PlannerInfo *root,
*** 3933,3939 ****
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups));
  			else
  				add_path(grouped_rel, (Path *)
  						 create_group_path(root,
--- 3983,3989 ----
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups), false);
  			else
  				add_path(grouped_rel, (Path *)
  						 create_group_path(root,
*************** create_grouping_paths(PlannerInfo *root,
*** 3942,3948 ****
  										   target,
  										   parse->groupClause,
  										   (List *) parse->havingQual,
! 										   dNumGroups));
  
  			/*
  			 * The point of using Gather Merge rather than Gather is that it
--- 3992,3998 ----
  										   target,
  										   parse->groupClause,
  										   (List *) parse->havingQual,
! 										   dNumGroups), false);
  
  			/*
  			 * The point of using Gather Merge rather than Gather is that it
*************** create_grouping_paths(PlannerInfo *root,
*** 3995,4001 ****
  												 parse->groupClause,
  												 (List *) parse->havingQual,
  												 &agg_final_costs,
! 												 dNumGroups));
  					else
  						add_path(grouped_rel, (Path *)
  								 create_group_path(root,
--- 4045,4051 ----
  												 parse->groupClause,
  												 (List *) parse->havingQual,
  												 &agg_final_costs,
! 												 dNumGroups), false);
  					else
  						add_path(grouped_rel, (Path *)
  								 create_group_path(root,
*************** create_grouping_paths(PlannerInfo *root,
*** 4004,4010 ****
  												   target,
  												   parse->groupClause,
  												   (List *) parse->havingQual,
! 												   dNumGroups));
  				}
  			}
  		}
--- 4054,4060 ----
  												   target,
  												   parse->groupClause,
  												   (List *) parse->havingQual,
! 												   dNumGroups), false);
  				}
  			}
  		}
*************** create_grouping_paths(PlannerInfo *root,
*** 4049,4055 ****
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 agg_costs,
! 										 dNumGroups));
  			}
  		}
  
--- 4099,4105 ----
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 agg_costs,
! 										 dNumGroups), false);
  			}
  		}
  
*************** create_grouping_paths(PlannerInfo *root,
*** 4087,4095 ****
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups));
  			}
  		}
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
--- 4137,4212 ----
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups), false);
  			}
  		}
+ 
+ 		/*
+ 		 * If input_rel has partially aggregated partial paths, gather them
+ 		 * and perform the final aggregation.
+ 		 *
+ 		 * TODO Allow havingQual - currently not supported at base relation
+ 		 * level.
+ 		 */
+ 		if (input_rel->gpi != NULL &&
+ 			input_rel->gpi->partial_pathlist != NIL &&
+ 			!parse->havingQual)
+ 		{
+ 			Path	   *path = (Path *) linitial(input_rel->gpi->partial_pathlist);
+ 			double		total_groups = path->rows * path->parallel_workers;
+ 
+ 			path = (Path *) create_gather_path(root,
+ 											   input_rel,
+ 											   path,
+ 											   path->pathtarget,
+ 											   NULL,
+ 											   &total_groups);
+ 
+ 			/*
+ 			 * The input path is partially aggregated and the final
+ 			 * aggregation - if the path wins - will be done below. So we're
+ 			 * done with it for now.
+ 			 *
+ 			 * The top-level grouped_rel needs to receive the path into
+ 			 * regular pathlist, as opposed grouped_rel->gpi->pathlist.
+ 			 */
+ 			add_path(input_rel, path, false);
+ 		}
+ 
+ 		/*
+ 		 * If input_rel has partially aggregated paths, perform the final
+ 		 * aggregation.
+ 		 *
+ 		 * TODO Allow havingQual - currently not supported at base relation
+ 		 * level.
+ 		 */
+ 		if (input_rel->gpi != NULL && input_rel->gpi->pathlist != NIL &&
+ 			!parse->havingQual)
+ 		{
+ 			Path *pre_agg = (Path *) linitial(input_rel->gpi->pathlist);
+ 
+ 			dNumGroups = get_number_of_groups(root, pre_agg->rows, gd);
+ 
+ 			MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
+ 			get_agg_clause_costs(root, (Node *) target->exprs,
+ 								 AGGSPLIT_FINAL_DESERIAL,
+ 								 &agg_final_costs);
+ 			get_agg_clause_costs(root, parse->havingQual,
+ 								 AGGSPLIT_FINAL_DESERIAL,
+ 								 &agg_final_costs);
+ 
+ 			add_path(grouped_rel,
+ 					 (Path *) create_agg_path(root, grouped_rel,
+ 											  pre_agg,
+ 											  target,
+ 											  AGG_HASHED,
+ 											  AGGSPLIT_FINAL_DESERIAL,
+ 											  parse->groupClause,
+ 											  (List *) parse->havingQual,
+ 											  &agg_final_costs,
+ 											  dNumGroups),
+ 					 false);
+ 		}
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
*************** consider_groupingsets_paths(PlannerInfo
*** 4289,4295 ****
  										  strat,
  										  new_rollups,
  										  agg_costs,
! 										  dNumGroups));
  		return;
  	}
  
--- 4406,4412 ----
  										  strat,
  										  new_rollups,
  										  agg_costs,
! 										  dNumGroups), false);
  		return;
  	}
  
*************** consider_groupingsets_paths(PlannerInfo
*** 4447,4453 ****
  											  AGG_MIXED,
  											  rollups,
  											  agg_costs,
! 											  dNumGroups));
  		}
  	}
  
--- 4564,4570 ----
  											  AGG_MIXED,
  											  rollups,
  											  agg_costs,
! 											  dNumGroups), false);
  		}
  	}
  
*************** consider_groupingsets_paths(PlannerInfo
*** 4464,4470 ****
  										  AGG_SORTED,
  										  gd->rollups,
  										  agg_costs,
! 										  dNumGroups));
  }
  
  /*
--- 4581,4587 ----
  										  AGG_SORTED,
  										  gd->rollups,
  										  agg_costs,
! 										  dNumGroups), false);
  }
  
  /*
*************** create_one_window_path(PlannerInfo *root
*** 4649,4655 ****
  								  window_pathkeys);
  	}
  
! 	add_path(window_rel, path);
  }
  
  /*
--- 4766,4772 ----
  								  window_pathkeys);
  	}
  
! 	add_path(window_rel, path, false);
  }
  
  /*
*************** create_distinct_paths(PlannerInfo *root,
*** 4755,4761 ****
  						 create_upper_unique_path(root, distinct_rel,
  												  path,
  										list_length(root->distinct_pathkeys),
! 												  numDistinctRows));
  			}
  		}
  
--- 4872,4878 ----
  						 create_upper_unique_path(root, distinct_rel,
  												  path,
  										list_length(root->distinct_pathkeys),
! 												  numDistinctRows), false);
  			}
  		}
  
*************** create_distinct_paths(PlannerInfo *root,
*** 4782,4788 ****
  				 create_upper_unique_path(root, distinct_rel,
  										  path,
  										list_length(root->distinct_pathkeys),
! 										  numDistinctRows));
  	}
  
  	/*
--- 4899,4905 ----
  				 create_upper_unique_path(root, distinct_rel,
  										  path,
  										list_length(root->distinct_pathkeys),
! 										  numDistinctRows), false);
  	}
  
  	/*
*************** create_distinct_paths(PlannerInfo *root,
*** 4829,4835 ****
  								 parse->distinctClause,
  								 NIL,
  								 NULL,
! 								 numDistinctRows));
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
--- 4946,4952 ----
  								 parse->distinctClause,
  								 NIL,
  								 NULL,
! 								 numDistinctRows), false);
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
*************** create_ordered_paths(PlannerInfo *root,
*** 4927,4933 ****
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path);
  		}
  	}
  
--- 5044,5050 ----
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path, false);
  		}
  	}
  
*************** create_ordered_paths(PlannerInfo *root,
*** 4977,4983 ****
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path);
  		}
  	}
  
--- 5094,5100 ----
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path, false);
  		}
  	}
  
*************** get_partitioned_child_rels(PlannerInfo *
*** 6083,6085 ****
--- 6200,6230 ----
  
  	return result;
  }
+ 
+ /*
+  * get_partitioned_child_rels_for_join
+  *		Build and return a list containing the RTI of every partitioned
+  *		relation which is a child of some rel included in the join.
+  *
+  * Note: Only call this function on joins between partitioned tables.
+  */
+ List *
+ get_partitioned_child_rels_for_join(PlannerInfo *root,
+ 									RelOptInfo *joinrel)
+ {
+ 	List	   *result = NIL;
+ 	ListCell   *l;
+ 
+ 	foreach(l, root->pcinfo_list)
+ 	{
+ 		PartitionedChildRelInfo	*pc = lfirst(l);
+ 
+ 		if (bms_is_member(pc->parent_relid, joinrel->relids))
+ 			result = list_concat(result, list_copy(pc->child_rels));
+ 	}
+ 
+ 	/* The root partitioned table is included as a child rel */
+ 	Assert(list_length(result) >= bms_num_members(joinrel->relids));
+ 
+ 	return result;
+ }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
new file mode 100644
index 1278371..44c3919
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** typedef struct
*** 40,46 ****
--- 40,50 ----
  	List	   *tlist;			/* underlying target list */
  	int			num_vars;		/* number of plain Var tlist entries */
  	bool		has_ph_vars;	/* are there PlaceHolderVar entries? */
+ 	bool		has_grp_vars;	/* are there GroupedVar entries? */
  	bool		has_non_vars;	/* are there other entries? */
+ 	bool		has_conv_whole_rows;	/* are there ConvertRowtypeExpr entries
+ 										 * encapsulating a whole-row Var?
+ 										 */
  	tlist_vinfo vars[FLEXIBLE_ARRAY_MEMBER];	/* has num_vars entries */
  } indexed_tlist;
  
*************** static List *set_returning_clause_refere
*** 139,144 ****
--- 143,149 ----
  								int rtoffset);
  static bool extract_query_dependencies_walker(Node *node,
  								  PlannerInfo *context);
+ static Var *get_wholerow_ref_from_convert_row_type(Node *node);
  
  /*****************************************************************************
   *
*************** set_upper_references(PlannerInfo *root,
*** 1725,1733 ****
--- 1730,1781 ----
  	indexed_tlist *subplan_itlist;
  	List	   *output_targetlist;
  	ListCell   *l;
+ 	List	*sub_tlist_save = NIL;
+ 
+ 	if (root->grouped_var_list != NIL)
+ 	{
+ 		if (IsA(plan, Agg))
+ 		{
+ 			Agg	*agg = (Agg *) plan;
+ 
+ 			if (agg->aggsplit == AGGSPLIT_FINAL_DESERIAL)
+ 			{
+ 				/*
+ 				 * convert_combining_aggrefs could have replaced some vars
+ 				 * with Aggref expressions representing the partial
+ 				 * aggregation. We need to restore the same Aggrefs in the
+ 				 * subplan targetlist, but this would break the subplan if
+ 				 * it's something else than the partial aggregation (i.e. the
+ 				 * partial aggregation takes place lower in the plan tree). So
+ 				 * we'll eventually need to restore the original list.
+ 				 */
+ 				if (!IsA(subplan, Agg))
+ 					sub_tlist_save = subplan->targetlist;
+ #ifdef USE_ASSERT_CHECKING
+ 				else
+ 					Assert(((Agg *) subplan)->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+ #endif	/* USE_ASSERT_CHECKING */
+ 
+ 				/*
+ 				 * Restore the aggregate expressions that we might have
+ 				 * removed when planning for aggregation at base relation
+ 				 * level.
+ 				 */
+ 				subplan->targetlist =
+ 					restore_grouping_expressions(root, subplan->targetlist);
+ 			}
+ 		}
+ 	}
  
  	subplan_itlist = build_tlist_index(subplan->targetlist);
  
+ 	/*
+ 	 * The replacement of GroupVars by Aggrefs was only needed for the index
+ 	 * build.
+ 	 */
+ 	if (sub_tlist_save != NIL)
+ 		subplan->targetlist = sub_tlist_save;
+ 
  	output_targetlist = NIL;
  	foreach(l, plan->targetlist)
  	{
*************** build_tlist_index(List *tlist)
*** 1937,1943 ****
--- 1985,1993 ----
  
  	itlist->tlist = tlist;
  	itlist->has_ph_vars = false;
+ 	itlist->has_grp_vars = false;
  	itlist->has_non_vars = false;
+ 	itlist->has_conv_whole_rows = false;
  
  	/* Find the Vars and fill in the index array */
  	vinfo = itlist->vars;
*************** build_tlist_index(List *tlist)
*** 1956,1961 ****
--- 2006,2015 ----
  		}
  		else if (tle->expr && IsA(tle->expr, PlaceHolderVar))
  			itlist->has_ph_vars = true;
+ 		else if (tle->expr && IsA(tle->expr, GroupedVar))
+ 			itlist->has_grp_vars = true;
+ 		else if (get_wholerow_ref_from_convert_row_type((Node *) tle->expr))
+ 			itlist->has_conv_whole_rows = true;
  		else
  			itlist->has_non_vars = true;
  	}
*************** build_tlist_index(List *tlist)
*** 1971,1977 ****
   * This is like build_tlist_index, but we only index tlist entries that
   * are Vars belonging to some rel other than the one specified.  We will set
   * has_ph_vars (allowing PlaceHolderVars to be matched), but not has_non_vars
!  * (so nothing other than Vars and PlaceHolderVars can be matched).
   */
  static indexed_tlist *
  build_tlist_index_other_vars(List *tlist, Index ignore_rel)
--- 2025,2034 ----
   * This is like build_tlist_index, but we only index tlist entries that
   * are Vars belonging to some rel other than the one specified.  We will set
   * has_ph_vars (allowing PlaceHolderVars to be matched), but not has_non_vars
!  * (so nothing other than Vars and PlaceHolderVars can be matched). In case of
!  * DML, where this function will be used, returning lists from child relations
!  * will be appended similar to a simple append relation. That does not require
!  * fixing ConvertRowtypeExpr references. So, those are not considered here.
   */
  static indexed_tlist *
  build_tlist_index_other_vars(List *tlist, Index ignore_rel)
*************** build_tlist_index_other_vars(List *tlist
*** 1988,1993 ****
--- 2045,2051 ----
  	itlist->tlist = tlist;
  	itlist->has_ph_vars = false;
  	itlist->has_non_vars = false;
+ 	itlist->has_conv_whole_rows = false;
  
  	/* Find the desired Vars and fill in the index array */
  	vinfo = itlist->vars;
*************** fix_join_expr_mutator(Node *node, fix_jo
*** 2233,2238 ****
--- 2291,2321 ----
  		/* No referent found for Var */
  		elog(ERROR, "variable not found in subplan target lists");
  	}
+ 	if (IsA(node, GroupedVar))
+ 	{
+ 		GroupedVar *gvar = (GroupedVar *) node;
+ 
+ 		/* See if the GroupedVar has bubbled up from a lower plan node */
+ 		if (context->outer_itlist && context->outer_itlist->has_grp_vars)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) gvar,
+ 													  context->outer_itlist,
+ 													  OUTER_VAR);
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 		if (context->inner_itlist && context->inner_itlist->has_grp_vars)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) gvar,
+ 													  context->inner_itlist,
+ 													  INNER_VAR);
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 
+ 		/* No referent found for GroupedVar */
+ 		elog(ERROR, "grouped variable not found in subplan target lists");
+ 	}
  	if (IsA(node, PlaceHolderVar))
  	{
  		PlaceHolderVar *phv = (PlaceHolderVar *) node;
*************** fix_join_expr_mutator(Node *node, fix_jo
*** 2258,2263 ****
--- 2341,2369 ----
  		/* If not supplied by input plans, evaluate the contained expr */
  		return fix_join_expr_mutator((Node *) phv->phexpr, context);
  	}
+ 	if (get_wholerow_ref_from_convert_row_type(node))
+ 	{
+ 		if (context->outer_itlist &&
+ 			context->outer_itlist->has_conv_whole_rows)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) node,
+ 													 context->outer_itlist,
+ 																OUTER_VAR);
+ 
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 		if (context->inner_itlist &&
+ 			context->inner_itlist->has_conv_whole_rows)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) node,
+ 													 context->inner_itlist,
+ 																INNER_VAR);
+ 
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 	}
  	if (IsA(node, Param))
  		return fix_param_node(context->root, (Param *) node);
  	/* Try matching more complex expressions too, if tlists have any */
*************** fix_upper_expr_mutator(Node *node, fix_u
*** 2364,2369 ****
--- 2470,2486 ----
  		/* If not supplied by input plan, evaluate the contained expr */
  		return fix_upper_expr_mutator((Node *) phv->phexpr, context);
  	}
+ 	if (get_wholerow_ref_from_convert_row_type(node))
+ 	{
+ 		if (context->subplan_itlist->has_conv_whole_rows)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) node,
+ 													  context->subplan_itlist,
+ 													  context->newvarno);
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 	}
  	if (IsA(node, Param))
  		return fix_param_node(context->root, (Param *) node);
  	if (IsA(node, Aggref))
*************** fix_upper_expr_mutator(Node *node, fix_u
*** 2389,2395 ****
  		/* If no match, just fall through to process it normally */
  	}
  	/* Try matching more complex expressions too, if tlist has any */
! 	if (context->subplan_itlist->has_non_vars)
  	{
  		newvar = search_indexed_tlist_for_non_var((Expr *) node,
  												  context->subplan_itlist,
--- 2506,2513 ----
  		/* If no match, just fall through to process it normally */
  	}
  	/* Try matching more complex expressions too, if tlist has any */
! 	if (context->subplan_itlist->has_grp_vars ||
! 		context->subplan_itlist->has_non_vars)
  	{
  		newvar = search_indexed_tlist_for_non_var((Expr *) node,
  												  context->subplan_itlist,
*************** extract_query_dependencies_walker(Node *
*** 2596,2598 ****
--- 2714,2748 ----
  	return expression_tree_walker(node, extract_query_dependencies_walker,
  								  (void *) context);
  }
+ 
+ /*
+  * get_wholerow_ref_from_convert_row_type
+  *		Given a node, check if it's a ConvertRowtypeExpr encapsulating a
+  *		whole-row reference as implicit cast and return the whole-row
+  *		reference Var if so. Otherwise return NULL. In case of multi-level
+  *		partitioning, we will have as many nested ConvertRowtypeExpr as there
+  *		are levels in partition hierarchy.
+  */
+ static Var *
+ get_wholerow_ref_from_convert_row_type(Node *node)
+ {
+ 	Var		   *var = NULL;
+ 	ConvertRowtypeExpr *convexpr;
+ 
+ 	if (!node || !IsA(node, ConvertRowtypeExpr))
+ 		return NULL;
+ 
+ 	/* Traverse nested ConvertRowtypeExpr's. */
+ 	convexpr = castNode(ConvertRowtypeExpr, node);
+ 	while (convexpr->convertformat == COERCE_IMPLICIT_CAST &&
+ 		   IsA(convexpr->arg, ConvertRowtypeExpr))
+ 		convexpr = (ConvertRowtypeExpr *) convexpr->arg;
+ 
+ 	if (IsA(convexpr->arg, Var))
+ 		var = castNode(Var, convexpr->arg);
+ 
+ 	if (var && var->varattno == 0)
+ 		return var;
+ 
+ 	return NULL;
+ }
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
new file mode 100644
index a1be858..8bdaa44
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***************
*** 55,61 ****
  typedef struct
  {
  	PlannerInfo *root;
! 	AppendRelInfo *appinfo;
  } adjust_appendrel_attrs_context;
  
  static Path *recurse_set_operations(Node *setOp, PlannerInfo *root,
--- 55,62 ----
  typedef struct
  {
  	PlannerInfo *root;
! 	int		nappinfos;
! 	AppendRelInfo **appinfos;
  } adjust_appendrel_attrs_context;
  
  static Path *recurse_set_operations(Node *setOp, PlannerInfo *root,
*************** static List *generate_append_tlist(List
*** 97,103 ****
  					  List *input_tlists,
  					  List *refnames_tlist);
  static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
! static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
  						 Index rti);
  static void make_inh_translation_list(Relation oldrelation,
  						  Relation newrelation,
--- 98,104 ----
  					  List *input_tlists,
  					  List *refnames_tlist);
  static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
! static List *expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
  						 Index rti);
  static void make_inh_translation_list(Relation oldrelation,
  						  Relation newrelation,
*************** static Bitmapset *translate_col_privs(co
*** 107,113 ****
  					List *translated_vars);
  static Node *adjust_appendrel_attrs_mutator(Node *node,
  							   adjust_appendrel_attrs_context *context);
- static Relids adjust_relid_set(Relids relids, Index oldrelid, Index newrelid);
  static List *adjust_inherited_tlist(List *tlist,
  					   AppendRelInfo *context);
  
--- 108,113 ----
*************** plan_set_operations(PlannerInfo *root)
*** 207,213 ****
  	root->processed_tlist = top_tlist;
  
  	/* Add only the final path to the SETOP upperrel. */
! 	add_path(setop_rel, path);
  
  	/* Let extensions possibly add some more paths */
  	if (create_upper_paths_hook)
--- 207,213 ----
  	root->processed_tlist = top_tlist;
  
  	/* Add only the final path to the SETOP upperrel. */
! 	add_path(setop_rel, path, false);
  
  	/* Let extensions possibly add some more paths */
  	if (create_upper_paths_hook)
*************** expand_inherited_tables(PlannerInfo *roo
*** 1330,1348 ****
  	Index		nrtes;
  	Index		rti;
  	ListCell   *rl;
  
  	/*
  	 * expand_inherited_rtentry may add RTEs to parse->rtable; there is no
  	 * need to scan them since they can't have inh=true.  So just scan as far
  	 * as the original end of the rtable list.
  	 */
! 	nrtes = list_length(root->parse->rtable);
! 	rl = list_head(root->parse->rtable);
  	for (rti = 1; rti <= nrtes; rti++)
  	{
  		RangeTblEntry *rte = (RangeTblEntry *) lfirst(rl);
  
! 		expand_inherited_rtentry(root, rte, rti);
  		rl = lnext(rl);
  	}
  }
--- 1330,1351 ----
  	Index		nrtes;
  	Index		rti;
  	ListCell   *rl;
+ 	Query	   *parse = root->parse;
  
  	/*
  	 * expand_inherited_rtentry may add RTEs to parse->rtable; there is no
  	 * need to scan them since they can't have inh=true.  So just scan as far
  	 * as the original end of the rtable list.
  	 */
! 	nrtes = list_length(parse->rtable);
! 	rl = list_head(parse->rtable);
  	for (rti = 1; rti <= nrtes; rti++)
  	{
  		RangeTblEntry *rte = (RangeTblEntry *) lfirst(rl);
+ 		List		  *appinfos;
  
! 		appinfos = expand_inherited_rtentry(root, rte, rti);
! 		root->append_rel_list = list_concat(root->append_rel_list, appinfos);
  		rl = lnext(rl);
  	}
  }
*************** expand_inherited_tables(PlannerInfo *roo
*** 1362,1369 ****
   *
   * A childless table is never considered to be an inheritance set; therefore
   * a parent RTE must always have at least two associated AppendRelInfos.
   */
! static void
  expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
  {
  	Query	   *parse = root->parse;
--- 1365,1374 ----
   *
   * A childless table is never considered to be an inheritance set; therefore
   * a parent RTE must always have at least two associated AppendRelInfos.
+  *
+  * Returns a list of AppendRelInfos, or NIL.
   */
! static List*
  expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
  {
  	Query	   *parse = root->parse;
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1380,1391 ****
  
  	/* Does RT entry allow inheritance? */
  	if (!rte->inh)
! 		return;
  	/* Ignore any already-expanded UNION ALL nodes */
  	if (rte->rtekind != RTE_RELATION)
  	{
  		Assert(rte->rtekind == RTE_SUBQUERY);
! 		return;
  	}
  	/* Fast path for common case of childless table */
  	parentOID = rte->relid;
--- 1385,1396 ----
  
  	/* Does RT entry allow inheritance? */
  	if (!rte->inh)
! 		return NIL;
  	/* Ignore any already-expanded UNION ALL nodes */
  	if (rte->rtekind != RTE_RELATION)
  	{
  		Assert(rte->rtekind == RTE_SUBQUERY);
! 		return NIL;
  	}
  	/* Fast path for common case of childless table */
  	parentOID = rte->relid;
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1393,1399 ****
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return;
  	}
  
  	/*
--- 1398,1404 ----
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return NIL;
  	}
  
  	/*
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1417,1424 ****
  	else
  		lockmode = AccessShareLock;
  
! 	/* Scan for all members of inheritance set, acquire needed locks */
! 	inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
  
  	/*
  	 * Check that there's at least one descendant, else treat as no-child
--- 1422,1440 ----
  	else
  		lockmode = AccessShareLock;
  
! 	/*
! 	 * Expand partitioned table level-wise to help optimizations like
! 	 * partition-wise join which match partitions at every level. Otherwise,
! 	 * scan for all members of inheritance set. Acquire needed locks
! 	 */
! 	if (rte->relkind == RELKIND_PARTITIONED_TABLE)
! 	{
! 		inhOIDs = list_make1_oid(parentOID);
! 		inhOIDs = list_concat(inhOIDs,
! 							  find_inheritance_children(parentOID, lockmode));
! 	}
! 	else
! 		inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
  
  	/*
  	 * Check that there's at least one descendant, else treat as no-child
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1429,1435 ****
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return;
  	}
  
  	/*
--- 1445,1451 ----
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return NIL;
  	}
  
  	/*
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1457,1462 ****
--- 1473,1484 ----
  		Index		childRTindex;
  		AppendRelInfo *appinfo;
  
+ 		/*
+ 		 * If this child is a partitioned table, this contains AppendRelInfos
+ 		 * for its own children.
+ 		 */
+ 		List		  *myappinfos;
+ 
  		/* Open rel if needed; we already have required locks */
  		if (childOID != parentOID)
  			newrelation = heap_open(childOID, NoLock);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1490,1496 ****
  		childrte = copyObject(rte);
  		childrte->relid = childOID;
  		childrte->relkind = newrelation->rd_rel->relkind;
! 		childrte->inh = false;
  		childrte->requiredPerms = 0;
  		childrte->securityQuals = NIL;
  		parse->rtable = lappend(parse->rtable, childrte);
--- 1512,1523 ----
  		childrte = copyObject(rte);
  		childrte->relid = childOID;
  		childrte->relkind = newrelation->rd_rel->relkind;
! 		/* A partitioned child will need to be expanded further. */
! 		if (childOID != parentOID &&
! 			childrte->relkind == RELKIND_PARTITIONED_TABLE)
! 			childrte->inh = true;
! 		else
! 			childrte->inh = false;
  		childrte->requiredPerms = 0;
  		childrte->securityQuals = NIL;
  		parse->rtable = lappend(parse->rtable, childrte);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1498,1506 ****
  
  		/*
  		 * Build an AppendRelInfo for this parent and child, unless the child
! 		 * is a partitioned table.
  		 */
! 		if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
  		{
  			need_append = true;
  			appinfo = makeNode(AppendRelInfo);
--- 1525,1533 ----
  
  		/*
  		 * Build an AppendRelInfo for this parent and child, unless the child
! 		 * RTE simply duplicates the parent *partitioned* table.
  		 */
! 		if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh)
  		{
  			need_append = true;
  			appinfo = makeNode(AppendRelInfo);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1570,1575 ****
--- 1597,1610 ----
  		/* Close child relations, but keep locks */
  		if (childOID != parentOID)
  			heap_close(newrelation, NoLock);
+ 
+ 		/* Expand partitioned children recursively. */
+ 		if (childrte->inh)
+ 		{
+ 			myappinfos = expand_inherited_rtentry(root, childrte,
+ 												  childRTindex);
+ 			appinfos = list_concat(appinfos, myappinfos);
+ 		}
  	}
  
  	heap_close(oldrelation, NoLock);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1585,1591 ****
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return;
  	}
  
  	/*
--- 1620,1626 ----
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return NIL;
  	}
  
  	/*
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1606,1613 ****
  		root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
  	}
  
! 	/* Otherwise, OK to add to root->append_rel_list */
! 	root->append_rel_list = list_concat(root->append_rel_list, appinfos);
  }
  
  /*
--- 1641,1648 ----
  		root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
  	}
  
! 	/* The following will be concatenated to root->append_rel_list. */
! 	return appinfos;
  }
  
  /*
*************** translate_col_privs(const Bitmapset *par
*** 1767,1776 ****
  
  /*
   * adjust_appendrel_attrs
!  *	  Copy the specified query or expression and translate Vars referring
!  *	  to the parent rel of the specified AppendRelInfo to refer to the
!  *	  child rel instead.  We also update rtindexes appearing outside Vars,
!  *	  such as resultRelation and jointree relids.
   *
   * Note: this is only applied after conversion of sublinks to subplans,
   * so we don't need to cope with recursion into sub-queries.
--- 1802,1812 ----
  
  /*
   * adjust_appendrel_attrs
!  *	  Copy the specified query or expression and translate Vars referring to
!  *	  the parent rels of the child rels specified in the given list of
!  *	  AppendRelInfos to refer to the corresponding child rel instead.  We also
!  *	  update rtindexes appearing outside Vars, such as resultRelation and
!  *	  jointree relids.
   *
   * Note: this is only applied after conversion of sublinks to subplans,
   * so we don't need to cope with recursion into sub-queries.
*************** translate_col_privs(const Bitmapset *par
*** 1779,1791 ****
   * maybe we should try to fold the two routines together.
   */
  Node *
! adjust_appendrel_attrs(PlannerInfo *root, Node *node, AppendRelInfo *appinfo)
  {
  	Node	   *result;
  	adjust_appendrel_attrs_context context;
  
  	context.root = root;
! 	context.appinfo = appinfo;
  
  	/*
  	 * Must be prepared to start with a Query or a bare expression tree.
--- 1815,1835 ----
   * maybe we should try to fold the two routines together.
   */
  Node *
! adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos,
! 					   AppendRelInfo **appinfos)
  {
  	Node	   *result;
  	adjust_appendrel_attrs_context context;
  
  	context.root = root;
! 	context.nappinfos = nappinfos;
! 	context.appinfos = appinfos;
! 
! 	/*
! 	 * Catch a caller who wants to adjust expressions, but doesn't pass any
! 	 * AppendRelInfo.
! 	 */
! 	Assert(appinfos && nappinfos >= 1);
  
  	/*
  	 * Must be prepared to start with a Query or a bare expression tree.
*************** adjust_appendrel_attrs(PlannerInfo *root
*** 1793,1812 ****
  	if (node && IsA(node, Query))
  	{
  		Query	   *newnode;
  
  		newnode = query_tree_mutator((Query *) node,
  									 adjust_appendrel_attrs_mutator,
  									 (void *) &context,
  									 QTW_IGNORE_RC_SUBQUERIES);
! 		if (newnode->resultRelation == appinfo->parent_relid)
  		{
! 			newnode->resultRelation = appinfo->child_relid;
! 			/* Fix tlist resnos too, if it's inherited UPDATE */
! 			if (newnode->commandType == CMD_UPDATE)
! 				newnode->targetList =
! 					adjust_inherited_tlist(newnode->targetList,
! 										   appinfo);
  		}
  		result = (Node *) newnode;
  	}
  	else
--- 1837,1864 ----
  	if (node && IsA(node, Query))
  	{
  		Query	   *newnode;
+ 		int		cnt;
  
  		newnode = query_tree_mutator((Query *) node,
  									 adjust_appendrel_attrs_mutator,
  									 (void *) &context,
  									 QTW_IGNORE_RC_SUBQUERIES);
! 		for (cnt = 0; cnt < nappinfos; cnt++)
  		{
! 			AppendRelInfo *appinfo = appinfos[cnt];
! 
! 			if (newnode->resultRelation == appinfo->parent_relid)
! 			{
! 				newnode->resultRelation = appinfo->child_relid;
! 				/* Fix tlist resnos too, if it's inherited UPDATE */
! 				if (newnode->commandType == CMD_UPDATE)
! 					newnode->targetList =
! 									adjust_inherited_tlist(newnode->targetList,
! 														   appinfo);
! 				break;
! 			}
  		}
+ 
  		result = (Node *) newnode;
  	}
  	else
*************** static Node *
*** 1819,1831 ****
  adjust_appendrel_attrs_mutator(Node *node,
  							   adjust_appendrel_attrs_context *context)
  {
! 	AppendRelInfo *appinfo = context->appinfo;
  
  	if (node == NULL)
  		return NULL;
  	if (IsA(node, Var))
  	{
  		Var		   *var = (Var *) copyObject(node);
  
  		if (var->varlevelsup == 0 &&
  			var->varno == appinfo->parent_relid)
--- 1871,1900 ----
  adjust_appendrel_attrs_mutator(Node *node,
  							   adjust_appendrel_attrs_context *context)
  {
! 	AppendRelInfo **appinfos = context->appinfos;
! 	int		nappinfos = context->nappinfos;
! 	int		cnt;
! 
! 	/*
! 	 * Catch a caller who wants to adjust expressions, but doesn't pass any
! 	 * AppendRelInfo.
! 	 */
! 	Assert(appinfos && nappinfos >= 1);
  
  	if (node == NULL)
  		return NULL;
  	if (IsA(node, Var))
  	{
  		Var		   *var = (Var *) copyObject(node);
+ 		AppendRelInfo *appinfo;
+ 
+ 		for (cnt = 0; cnt < nappinfos; cnt++)
+ 		{
+ 			appinfo = appinfos[cnt];
+ 
+ 			if (var->varno == appinfo->parent_relid)
+ 				break;
+ 		}
  
  		if (var->varlevelsup == 0 &&
  			var->varno == appinfo->parent_relid)
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 1908,1936 ****
  	{
  		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
  
! 		if (cexpr->cvarno == appinfo->parent_relid)
! 			cexpr->cvarno = appinfo->child_relid;
  		return (Node *) cexpr;
  	}
  	if (IsA(node, RangeTblRef))
  	{
  		RangeTblRef *rtr = (RangeTblRef *) copyObject(node);
  
! 		if (rtr->rtindex == appinfo->parent_relid)
! 			rtr->rtindex = appinfo->child_relid;
  		return (Node *) rtr;
  	}
  	if (IsA(node, JoinExpr))
  	{
  		/* Copy the JoinExpr node with correct mutation of subnodes */
  		JoinExpr   *j;
  
  		j = (JoinExpr *) expression_tree_mutator(node,
  											  adjust_appendrel_attrs_mutator,
  												 (void *) context);
  		/* now fix JoinExpr's rtindex (probably never happens) */
! 		if (j->rtindex == appinfo->parent_relid)
! 			j->rtindex = appinfo->child_relid;
  		return (Node *) j;
  	}
  	if (IsA(node, PlaceHolderVar))
--- 1977,2030 ----
  	{
  		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
  
! 		for (cnt = 0; cnt < nappinfos; cnt++)
! 		{
! 			AppendRelInfo *appinfo = appinfos[cnt];
! 
! 			if (cexpr->cvarno == appinfo->parent_relid)
! 			{
! 				cexpr->cvarno = appinfo->child_relid;
! 				break;
! 			}
! 		}
  		return (Node *) cexpr;
  	}
  	if (IsA(node, RangeTblRef))
  	{
  		RangeTblRef *rtr = (RangeTblRef *) copyObject(node);
  
! 		for (cnt = 0; cnt < nappinfos; cnt++)
! 		{
! 			AppendRelInfo *appinfo = appinfos[cnt];
! 
! 			if (rtr->rtindex == appinfo->parent_relid)
! 			{
! 				rtr->rtindex = appinfo->child_relid;
! 				break;
! 			}
! 		}
  		return (Node *) rtr;
  	}
  	if (IsA(node, JoinExpr))
  	{
  		/* Copy the JoinExpr node with correct mutation of subnodes */
  		JoinExpr   *j;
+ 		AppendRelInfo *appinfo;
  
  		j = (JoinExpr *) expression_tree_mutator(node,
  											  adjust_appendrel_attrs_mutator,
  												 (void *) context);
  		/* now fix JoinExpr's rtindex (probably never happens) */
! 		for (cnt = 0; cnt < nappinfos; cnt++)
! 		{
! 			appinfo = appinfos[cnt];
! 
! 			if (j->rtindex == appinfo->parent_relid)
! 			{
! 				j->rtindex = appinfo->child_relid;
! 				break;
! 			}
! 		}
  		return (Node *) j;
  	}
  	if (IsA(node, PlaceHolderVar))
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 1943,1951 ****
  														 (void *) context);
  		/* now fix PlaceHolderVar's relid sets */
  		if (phv->phlevelsup == 0)
! 			phv->phrels = adjust_relid_set(phv->phrels,
! 										   appinfo->parent_relid,
! 										   appinfo->child_relid);
  		return (Node *) phv;
  	}
  	/* Shouldn't need to handle planner auxiliary nodes here */
--- 2037,2044 ----
  														 (void *) context);
  		/* now fix PlaceHolderVar's relid sets */
  		if (phv->phlevelsup == 0)
! 			phv->phrels = adjust_child_relids(phv->phrels, context->nappinfos,
! 											  context->appinfos);
  		return (Node *) phv;
  	}
  	/* Shouldn't need to handle planner auxiliary nodes here */
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 1976,1999 ****
  			adjust_appendrel_attrs_mutator((Node *) oldinfo->orclause, context);
  
  		/* adjust relid sets too */
! 		newinfo->clause_relids = adjust_relid_set(oldinfo->clause_relids,
! 												  appinfo->parent_relid,
! 												  appinfo->child_relid);
! 		newinfo->required_relids = adjust_relid_set(oldinfo->required_relids,
! 													appinfo->parent_relid,
! 													appinfo->child_relid);
! 		newinfo->outer_relids = adjust_relid_set(oldinfo->outer_relids,
! 												 appinfo->parent_relid,
! 												 appinfo->child_relid);
! 		newinfo->nullable_relids = adjust_relid_set(oldinfo->nullable_relids,
! 													appinfo->parent_relid,
! 													appinfo->child_relid);
! 		newinfo->left_relids = adjust_relid_set(oldinfo->left_relids,
! 												appinfo->parent_relid,
! 												appinfo->child_relid);
! 		newinfo->right_relids = adjust_relid_set(oldinfo->right_relids,
! 												 appinfo->parent_relid,
! 												 appinfo->child_relid);
  
  		/*
  		 * Reset cached derivative fields, since these might need to have
--- 2069,2092 ----
  			adjust_appendrel_attrs_mutator((Node *) oldinfo->orclause, context);
  
  		/* adjust relid sets too */
! 		newinfo->clause_relids = adjust_child_relids(oldinfo->clause_relids,
! 													 context->nappinfos,
! 													 context->appinfos);
! 		newinfo->required_relids = adjust_child_relids(oldinfo->required_relids,
! 													   context->nappinfos,
! 													   context->appinfos);
! 		newinfo->outer_relids = adjust_child_relids(oldinfo->outer_relids,
! 													context->nappinfos,
! 													context->appinfos);
! 		newinfo->nullable_relids = adjust_child_relids(oldinfo->nullable_relids,
! 													   context->nappinfos,
! 													   context->appinfos);
! 		newinfo->left_relids = adjust_child_relids(oldinfo->left_relids,
! 												   context->nappinfos,
! 												   context->appinfos);
! 		newinfo->right_relids = adjust_child_relids(oldinfo->right_relids,
! 													context->nappinfos,
! 													context->appinfos);
  
  		/*
  		 * Reset cached derivative fields, since these might need to have
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 2025,2047 ****
  }
  
  /*
!  * Substitute newrelid for oldrelid in a Relid set
   */
! static Relids
! adjust_relid_set(Relids relids, Index oldrelid, Index newrelid)
  {
! 	if (bms_is_member(oldrelid, relids))
  	{
! 		/* Ensure we have a modifiable copy */
! 		relids = bms_copy(relids);
! 		/* Remove old, add new */
! 		relids = bms_del_member(relids, oldrelid);
! 		relids = bms_add_member(relids, newrelid);
  	}
  	return relids;
  }
  
  /*
   * Adjust the targetlist entries of an inherited UPDATE operation
   *
   * The expressions have already been fixed, but we have to make sure that
--- 2118,2212 ----
  }
  
  /*
!  * Replace parent relids by child relids in the copy of given relid set
!  * according to the given list of AppendRelInfos. The given relid set is
!  * returned as is if it contains no parent in the given list, otherwise, the
!  * given relid set is not changed.
   */
! Relids
! adjust_child_relids(Relids relids, int nappinfos, AppendRelInfo **appinfos)
  {
! 	Bitmapset  *result = NULL;
! 	int		cnt;
! 
! 	for (cnt = 0; cnt < nappinfos; cnt++)
  	{
! 		AppendRelInfo	*appinfo = appinfos[cnt];
! 
! 		/* Remove parent, add child */
! 		if (bms_is_member(appinfo->parent_relid, relids))
! 		{
! 			/* Make a copy if we are changing the set. */
! 			if (!result)
! 				result = bms_copy(relids);
! 
! 			result = bms_del_member(result, appinfo->parent_relid);
! 			result = bms_add_member(result, appinfo->child_relid);
! 		}
  	}
+ 
+ 	/* Return new set if we modified the given set. */
+ 	if (result)
+ 		return result;
+ 
+ 	/* Else return the given relids set as is. */
  	return relids;
  }
  
  /*
+  * Replace any relid present in top_parent_relids with its child in
+  * child_relids. Members of child_relids can be multiple levels below top
+  * parent in the partition hierarchy.
+  */
+ Relids
+ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
+ 							   Relids child_relids, Relids top_parent_relids)
+ {
+ 	AppendRelInfo **appinfos;
+ 	int		nappinfos;
+ 	Relids		parent_relids = NULL;
+ 	Relids		result;
+ 	Relids		tmp_result = NULL;
+ 	int		cnt;
+ 
+ 	/*
+ 	 * If the given relids set doesn't contain any of the top parent relids,
+ 	 * it will remain unchanged.
+ 	 */
+ 	if (!bms_overlap(relids, top_parent_relids))
+ 		return relids;
+ 
+ 	appinfos = find_appinfos_by_relids(root, child_relids, &nappinfos);
+ 
+ 	/* Construct relids set for the immediate parent of the given child. */
+ 	for (cnt = 0; cnt < nappinfos; cnt++)
+ 	{
+ 		AppendRelInfo   *appinfo = appinfos[cnt];
+ 
+ 		parent_relids = bms_add_member(parent_relids, appinfo->parent_relid);
+ 	}
+ 
+ 	/* Recurse if immediate parent is not the top parent. */
+ 	if (!bms_equal(parent_relids, top_parent_relids))
+ 	{
+ 		tmp_result = adjust_child_relids_multilevel(root, relids,
+ 													parent_relids,
+ 													top_parent_relids);
+ 		relids = tmp_result;
+ 	}
+ 
+ 	result = adjust_child_relids(relids, nappinfos, appinfos);
+ 
+ 	/* Free memory consumed by any intermediate result. */
+ 	if (tmp_result)
+ 		bms_free(tmp_result);
+ 	bms_free(parent_relids);
+ 	pfree(appinfos);
+ 
+ 	return result;
+ }
+ 
+ /*
   * Adjust the targetlist entries of an inherited UPDATE operation
   *
   * The expressions have already been fixed, but we have to make sure that
*************** adjust_inherited_tlist(List *tlist, Appe
*** 2142,2162 ****
   * adjust_appendrel_attrs_multilevel
   *	  Apply Var translations from a toplevel appendrel parent down to a child.
   *
!  * In some cases we need to translate expressions referencing a baserel
   * to reference an appendrel child that's multiple levels removed from it.
   */
  Node *
  adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  RelOptInfo *child_rel)
  {
! 	AppendRelInfo *appinfo = find_childrel_appendrelinfo(root, child_rel);
! 	RelOptInfo *parent_rel = find_base_rel(root, appinfo->parent_relid);
  
- 	/* If parent is also a child, first recurse to apply its translations */
- 	if (IS_OTHER_REL(parent_rel))
- 		node = adjust_appendrel_attrs_multilevel(root, node, parent_rel);
- 	else
- 		Assert(parent_rel->reloptkind == RELOPT_BASEREL);
  	/* Now translate for this child */
! 	return adjust_appendrel_attrs(root, node, appinfo);
  }
--- 2307,2432 ----
   * adjust_appendrel_attrs_multilevel
   *	  Apply Var translations from a toplevel appendrel parent down to a child.
   *
!  * In some cases we need to translate expressions referencing a parent relation
   * to reference an appendrel child that's multiple levels removed from it.
   */
  Node *
  adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  Relids child_relids,
! 								  Relids top_parent_relids)
  {
! 	AppendRelInfo **appinfos;
! 	Bitmapset  *parent_relids = NULL;
! 	int		nappinfos;
! 	int		cnt;
! 
! 	Assert(bms_num_members(child_relids) == bms_num_members(top_parent_relids));
! 
! 	appinfos = find_appinfos_by_relids(root, child_relids, &nappinfos);
! 
! 	/* Construct relids set for the immediate parent of given child. */
! 	for (cnt = 0; cnt < nappinfos; cnt++)
! 	{
! 		AppendRelInfo  *appinfo = appinfos[cnt];
! 
! 		parent_relids = bms_add_member(parent_relids, appinfo->parent_relid);
! 	}
! 
! 	/* Recurse if immediate parent is not the top parent. */
! 	if (!bms_equal(parent_relids, top_parent_relids))
! 		node = adjust_appendrel_attrs_multilevel(root, node, parent_relids,
! 												 top_parent_relids);
  
  	/* Now translate for this child */
! 	node = adjust_appendrel_attrs(root, node, nappinfos, appinfos);
! 
! 	pfree(appinfos);
! 
! 	return node;
! }
! 
! /*
!  * Construct the SpecialJoinInfo for a child-join by translating
!  * SpecialJoinInfo for the join between parents. left_relids and right_relids
!  * are the relids of left and right side of the join respectively.
!  */
! SpecialJoinInfo *
! build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
! 						Relids left_relids, Relids right_relids)
! {
! 	SpecialJoinInfo *sjinfo = makeNode(SpecialJoinInfo);
! 	AppendRelInfo **left_appinfos;
! 	int		left_nappinfos;
! 	AppendRelInfo **right_appinfos;
! 	int		right_nappinfos;
! 
! 	memcpy(sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo));
! 	left_appinfos = find_appinfos_by_relids(root, left_relids,
! 											&left_nappinfos);
! 	right_appinfos = find_appinfos_by_relids(root, right_relids,
! 											 &right_nappinfos);
! 
! 	sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand,
! 											   left_nappinfos, left_appinfos);
! 	sjinfo->min_righthand = adjust_child_relids(sjinfo->min_righthand,
! 												right_nappinfos,
! 												right_appinfos);
! 	sjinfo->syn_lefthand = adjust_child_relids(sjinfo->syn_lefthand,
! 											   left_nappinfos, left_appinfos);
! 	sjinfo->syn_righthand = adjust_child_relids(sjinfo->syn_righthand,
! 												right_nappinfos,
! 												right_appinfos);
! 
! 	/*
! 	 * Replace the Var nodes of parent with those of children in expressions.
! 	 * This function may be called within a temporary context, but the
! 	 * expressions will be shallow-copied into the plan. Hence copy those in
! 	 * the planner's context.
! 	 */
! 	sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root,
! 											   (Node *) sjinfo->semi_rhs_exprs,
! 															   right_nappinfos,
! 															   right_appinfos);
! 
! 	pfree(left_appinfos);
! 	pfree(right_appinfos);
! 
! 	return sjinfo;
! }
! 
! /*
!  * find_appinfos_by_relids
!  * 		Find AppendRelInfo structures for all relations specified by relids.
!  *
!  * The AppendRelInfos are returned in an array, which can be pfree'd by the
!  * caller.
!  */
! AppendRelInfo **
! find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
! {
! 	ListCell   *lc;
! 	AppendRelInfo **appinfos;
! 	int		cnt = 0;
! 
! 	*nappinfos = bms_num_members(relids);
! 	appinfos = (AppendRelInfo **) palloc(sizeof(AppendRelInfo *) * *nappinfos);
! 
! 	foreach (lc, root->append_rel_list)
! 	{
! 		AppendRelInfo *appinfo = lfirst(lc);
! 
! 		if (bms_is_member(appinfo->child_relid, relids))
! 		{
! 			appinfos[cnt] = appinfo;
! 			cnt++;
! 
! 			/* Stop when we have gathered all the AppendRelInfos. */
! 			if (cnt == *nappinfos)
! 				return appinfos;
! 		}
! 	}
! 
! 	/* Should have found the entries ... */
! 	elog(ERROR, "Did not find one or more of requested child rels in append_rel_list");
! 	return NULL;	/* not reached */
  }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
new file mode 100644
index 2d5caae..2bacec9
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
***************
*** 18,32 ****
--- 18,39 ----
  
  #include "miscadmin.h"
  #include "nodes/nodeFuncs.h"
+ #include "nodes/extensible.h"
  #include "optimizer/clauses.h"
  #include "optimizer/cost.h"
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
+ #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
+ #include "optimizer/tlist.h"
+ /* TODO Remove this if get_grouping_expressions ends up in another module. */
+ #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
+ #include "foreign/fdwapi.h"
  #include "utils/lsyscache.h"
+ #include "utils/memutils.h"
  #include "utils/selfuncs.h"
  
  
*************** set_cheapest(RelOptInfo *parent_rel)
*** 409,416 ****
   * Returns nothing, but modifies parent_rel->pathlist.
   */
  void
! add_path(RelOptInfo *parent_rel, Path *new_path)
  {
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	List	   *new_path_pathkeys;
--- 416,424 ----
   * Returns nothing, but modifies parent_rel->pathlist.
   */
  void
! add_path(RelOptInfo *parent_rel, Path *new_path, bool grouped)
  {
+ 	List	   *pathlist;
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	List	   *new_path_pathkeys;
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 427,432 ****
--- 435,448 ----
  	/* Pretend parameterized paths have no pathkeys, per comment above */
  	new_path_pathkeys = new_path->param_info ? NIL : new_path->pathkeys;
  
+ 	if (!grouped)
+ 		pathlist = parent_rel->pathlist;
+ 	else
+ 	{
+ 		Assert(parent_rel->gpi != NULL);
+ 		pathlist = parent_rel->gpi->pathlist;
+ 	}
+ 
  	/*
  	 * Loop to check proposed new path against old paths.  Note it is possible
  	 * for more than one old path to be tossed out because new_path dominates
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 436,442 ****
  	 * list cell.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(parent_rel->pathlist); p1 != NULL; p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		bool		remove_old = false; /* unless new proves superior */
--- 452,458 ----
  	 * list cell.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(pathlist); p1 != NULL; p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		bool		remove_old = false; /* unless new proves superior */
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 582,589 ****
  		 */
  		if (remove_old)
  		{
! 			parent_rel->pathlist = list_delete_cell(parent_rel->pathlist,
! 													p1, p1_prev);
  
  			/*
  			 * Delete the data pointed-to by the deleted cell, if possible
--- 598,604 ----
  		 */
  		if (remove_old)
  		{
! 			pathlist = list_delete_cell(pathlist, p1, p1_prev);
  
  			/*
  			 * Delete the data pointed-to by the deleted cell, if possible
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 614,622 ****
  	{
  		/* Accept the new path: insert it at proper place in pathlist */
  		if (insert_after)
! 			lappend_cell(parent_rel->pathlist, insert_after, new_path);
  		else
! 			parent_rel->pathlist = lcons(new_path, parent_rel->pathlist);
  	}
  	else
  	{
--- 629,642 ----
  	{
  		/* Accept the new path: insert it at proper place in pathlist */
  		if (insert_after)
! 			lappend_cell(pathlist, insert_after, new_path);
  		else
! 			pathlist = lcons(new_path, pathlist);
! 
! 		if (!grouped)
! 			parent_rel->pathlist = pathlist;
! 		else
! 			parent_rel->gpi->pathlist = pathlist;
  	}
  	else
  	{
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 646,653 ****
  bool
  add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 				  List *pathkeys, Relids required_outer)
  {
  	List	   *new_path_pathkeys;
  	bool		consider_startup;
  	ListCell   *p1;
--- 666,674 ----
  bool
  add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 				  List *pathkeys, Relids required_outer, bool grouped)
  {
+ 	List	   *pathlist;
  	List	   *new_path_pathkeys;
  	bool		consider_startup;
  	ListCell   *p1;
*************** add_path_precheck(RelOptInfo *parent_rel
*** 656,664 ****
  	new_path_pathkeys = required_outer ? NIL : pathkeys;
  
  	/* Decide whether new path's startup cost is interesting */
! 	consider_startup = required_outer ? parent_rel->consider_param_startup : parent_rel->consider_startup;
  
! 	foreach(p1, parent_rel->pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
--- 677,694 ----
  	new_path_pathkeys = required_outer ? NIL : pathkeys;
  
  	/* Decide whether new path's startup cost is interesting */
! 	consider_startup = required_outer ? parent_rel->consider_param_startup :
! 		parent_rel->consider_startup;
  
! 	if (!grouped)
! 		pathlist = parent_rel->pathlist;
! 	else
! 	{
! 		Assert(parent_rel->gpi != NULL);
! 		pathlist = parent_rel->gpi->pathlist;
! 	}
! 
! 	foreach(p1, pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
*************** add_path_precheck(RelOptInfo *parent_rel
*** 749,771 ****
   *	  referenced by partial BitmapHeapPaths.
   */
  void
! add_partial_path(RelOptInfo *parent_rel, Path *new_path)
  {
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	ListCell   *p1;
  	ListCell   *p1_prev;
  	ListCell   *p1_next;
  
  	/* Check for query cancel. */
  	CHECK_FOR_INTERRUPTS();
  
  	/*
  	 * As in add_path, throw out any paths which are dominated by the new
  	 * path, but throw out the new path if some existing path dominates it.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(parent_rel->partial_pathlist); p1 != NULL;
  		 p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
--- 779,810 ----
   *	  referenced by partial BitmapHeapPaths.
   */
  void
! add_partial_path(RelOptInfo *parent_rel, Path *new_path, bool grouped)
  {
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	ListCell   *p1;
  	ListCell   *p1_prev;
  	ListCell   *p1_next;
+ 	List	   *pathlist;
  
  	/* Check for query cancel. */
  	CHECK_FOR_INTERRUPTS();
  
+ 	if (!grouped)
+ 		pathlist = parent_rel->partial_pathlist;
+ 	else
+ 	{
+ 		Assert(parent_rel->gpi != NULL);
+ 		pathlist = parent_rel->gpi->partial_pathlist;
+ 	}
+ 
  	/*
  	 * As in add_path, throw out any paths which are dominated by the new
  	 * path, but throw out the new path if some existing path dominates it.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(pathlist); p1 != NULL;
  		 p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
*************** add_partial_path(RelOptInfo *parent_rel,
*** 819,830 ****
  		}
  
  		/*
! 		 * Remove current element from partial_pathlist if dominated by new.
  		 */
  		if (remove_old)
  		{
! 			parent_rel->partial_pathlist =
! 				list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev);
  			pfree(old_path);
  			/* p1_prev does not advance */
  		}
--- 858,868 ----
  		}
  
  		/*
! 		 * Remove current element from pathlist if dominated by new.
  		 */
  		if (remove_old)
  		{
! 			pathlist = list_delete_cell(pathlist, p1, p1_prev);
  			pfree(old_path);
  			/* p1_prev does not advance */
  		}
*************** add_partial_path(RelOptInfo *parent_rel,
*** 839,845 ****
  
  		/*
  		 * If we found an old path that dominates new_path, we can quit
! 		 * scanning the partial_pathlist; we will not add new_path, and we
  		 * assume new_path cannot dominate any later path.
  		 */
  		if (!accept_new)
--- 877,883 ----
  
  		/*
  		 * If we found an old path that dominates new_path, we can quit
! 		 * scanning the pathlist; we will not add new_path, and we
  		 * assume new_path cannot dominate any later path.
  		 */
  		if (!accept_new)
*************** add_partial_path(RelOptInfo *parent_rel,
*** 850,859 ****
  	{
  		/* Accept the new path: insert it at proper place */
  		if (insert_after)
! 			lappend_cell(parent_rel->partial_pathlist, insert_after, new_path);
  		else
! 			parent_rel->partial_pathlist =
! 				lcons(new_path, parent_rel->partial_pathlist);
  	}
  	else
  	{
--- 888,901 ----
  	{
  		/* Accept the new path: insert it at proper place */
  		if (insert_after)
! 			lappend_cell(pathlist, insert_after, new_path);
  		else
! 			pathlist = lcons(new_path, pathlist);
! 
! 		if (!grouped)
! 			parent_rel->partial_pathlist = pathlist;
! 		else
! 			parent_rel->gpi->partial_pathlist = pathlist;
  	}
  	else
  	{
*************** add_partial_path(RelOptInfo *parent_rel,
*** 874,882 ****
   */
  bool
  add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
! 						  List *pathkeys)
  {
  	ListCell   *p1;
  
  	/*
  	 * Our goal here is twofold.  First, we want to find out whether this path
--- 916,933 ----
   */
  bool
  add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
! 						  List *pathkeys, bool grouped)
  {
  	ListCell   *p1;
+ 	List	   *pathlist;
+ 
+ 	if (!grouped)
+ 		pathlist = parent_rel->partial_pathlist;
+ 	else
+ 	{
+ 		Assert(parent_rel->gpi != NULL);
+ 		pathlist = parent_rel->gpi->partial_pathlist;
+ 	}
  
  	/*
  	 * Our goal here is twofold.  First, we want to find out whether this path
*************** add_partial_path_precheck(RelOptInfo *pa
*** 886,895 ****
  	 * final cost computations.  If so, we definitely want to consider it.
  	 *
  	 * Unlike add_path(), we always compare pathkeys here.  This is because we
! 	 * expect partial_pathlist to be very short, and getting a definitive
! 	 * answer at this stage avoids the need to call add_path_precheck.
  	 */
! 	foreach(p1, parent_rel->partial_pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
--- 937,947 ----
  	 * final cost computations.  If so, we definitely want to consider it.
  	 *
  	 * Unlike add_path(), we always compare pathkeys here.  This is because we
! 	 * expect partial_pathlist / grouped_pathlist to be very short, and
! 	 * getting a definitive answer at this stage avoids the need to call
! 	 * add_path_precheck.
  	 */
! 	foreach(p1, pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
*************** add_partial_path_precheck(RelOptInfo *pa
*** 918,924 ****
  	 * completion.
  	 */
  	if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
! 						   NULL))
  		return false;
  
  	return true;
--- 970,976 ----
  	 * completion.
  	 */
  	if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
! 						   NULL, grouped))
  		return false;
  
  	return true;
*************** create_foreignscan_path(PlannerInfo *roo
*** 1994,2007 ****
   * Note: result must not share storage with either input
   */
  Relids
! calc_nestloop_required_outer(Path *outer_path, Path *inner_path)
  {
- 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
- 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
  	Relids		required_outer;
  
  	/* inner_path can require rels from outer path, but not vice versa */
! 	Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids));
  	/* easy case if inner path is not parameterized */
  	if (!inner_paramrels)
  		return bms_copy(outer_paramrels);
--- 2046,2060 ----
   * Note: result must not share storage with either input
   */
  Relids
! calc_nestloop_required_outer(Relids outerrelids,
! 							 Relids outer_paramrels,
! 							 Relids innerrelids,
! 							 Relids inner_paramrels)
  {
  	Relids		required_outer;
  
  	/* inner_path can require rels from outer path, but not vice versa */
! 	Assert(!bms_overlap(outer_paramrels, innerrelids));
  	/* easy case if inner path is not parameterized */
  	if (!inner_paramrels)
  		return bms_copy(outer_paramrels);
*************** calc_nestloop_required_outer(Path *outer
*** 2009,2015 ****
  	required_outer = bms_union(outer_paramrels, inner_paramrels);
  	/* ... and remove any mention of now-satisfied outer rels */
  	required_outer = bms_del_members(required_outer,
! 									 outer_path->parent->relids);
  	/* maintain invariant that required_outer is exactly NULL if empty */
  	if (bms_is_empty(required_outer))
  	{
--- 2062,2068 ----
  	required_outer = bms_union(outer_paramrels, inner_paramrels);
  	/* ... and remove any mention of now-satisfied outer rels */
  	required_outer = bms_del_members(required_outer,
! 									 outerrelids);
  	/* maintain invariant that required_outer is exactly NULL if empty */
  	if (bms_is_empty(required_outer))
  	{
*************** calc_non_nestloop_required_outer(Path *o
*** 2055,2060 ****
--- 2108,2114 ----
   * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
   * 'pathkeys' are the path keys of the new join path
   * 'required_outer' is the set of required outer rels
+  * 'target' can be passed to override that of joinrel.
   *
   * Returns the resulting path node.
   */
*************** create_nestloop_path(PlannerInfo *root,
*** 2068,2074 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer)
  {
  	NestPath   *pathnode = makeNode(NestPath);
  	Relids		inner_req_outer = PATH_REQ_OUTER(inner_path);
--- 2122,2129 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer,
! 					 PathTarget *target)
  {
  	NestPath   *pathnode = makeNode(NestPath);
  	Relids		inner_req_outer = PATH_REQ_OUTER(inner_path);
*************** create_nestloop_path(PlannerInfo *root,
*** 2101,2107 ****
  
  	pathnode->path.pathtype = T_NestLoop;
  	pathnode->path.parent = joinrel;
! 	pathnode->path.pathtarget = joinrel->reltarget;
  	pathnode->path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
--- 2156,2162 ----
  
  	pathnode->path.pathtype = T_NestLoop;
  	pathnode->path.parent = joinrel;
! 	pathnode->path.pathtarget = target == NULL ? joinrel->reltarget : target;
  	pathnode->path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
*************** create_mergejoin_path(PlannerInfo *root,
*** 2159,2171 ****
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys)
  {
  	MergePath  *pathnode = makeNode(MergePath);
  
  	pathnode->jpath.path.pathtype = T_MergeJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = joinrel->reltarget;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
--- 2214,2228 ----
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys,
! 					  PathTarget *target)
  {
  	MergePath  *pathnode = makeNode(MergePath);
  
  	pathnode->jpath.path.pathtype = T_MergeJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = target == NULL ? joinrel->reltarget :
! 		target;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
*************** create_mergejoin_path(PlannerInfo *root,
*** 2210,2215 ****
--- 2267,2273 ----
   * 'required_outer' is the set of required outer rels
   * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
   *		(this should be a subset of the restrict_clauses list)
+  * 'target' can be passed to override that of joinrel.
   */
  HashPath *
  create_hashjoin_path(PlannerInfo *root,
*************** create_hashjoin_path(PlannerInfo *root,
*** 2221,2233 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses)
  {
  	HashPath   *pathnode = makeNode(HashPath);
  
  	pathnode->jpath.path.pathtype = T_HashJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = joinrel->reltarget;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
--- 2279,2293 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses,
! 					 PathTarget *target)
  {
  	HashPath   *pathnode = makeNode(HashPath);
  
  	pathnode->jpath.path.pathtype = T_HashJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = target == NULL ? joinrel->reltarget :
! 		target;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
*************** create_agg_path(PlannerInfo *root,
*** 2682,2688 ****
  	pathnode->path.pathtarget = target;
  	/* For now, assume we are above any joins, so no parameterization */
  	pathnode->path.param_info = NULL;
! 	pathnode->path.parallel_aware = false;
  	pathnode->path.parallel_safe = rel->consider_parallel &&
  		subpath->parallel_safe;
  	pathnode->path.parallel_workers = subpath->parallel_workers;
--- 2742,2748 ----
  	pathnode->path.pathtarget = target;
  	/* For now, assume we are above any joins, so no parameterization */
  	pathnode->path.param_info = NULL;
! 	pathnode->path.parallel_aware = true;
  	pathnode->path.parallel_safe = rel->consider_parallel &&
  		subpath->parallel_safe;
  	pathnode->path.parallel_workers = subpath->parallel_workers;
*************** create_agg_path(PlannerInfo *root,
*** 2713,2718 ****
--- 2773,2948 ----
  }
  
  /*
+  * Apply partial AGG_SORTED aggregation path to subpath if it's suitably
+  * sorted.
+  *
+  * first_call indicates whether the function is being called first time for
+  * given index --- since the target should not change, we can skip the check
+  * of sorting during subsequent calls.
+  *
+  * group_clauses, group_exprs and agg_exprs are pointers to lists we populate
+  * when called first time for particular index, and that user passes for
+  * subsequent calls.
+  *
+  * NULL is returned if sorting of subpath output is not suitable.
+  */
+ AggPath *
+ create_partial_agg_sorted_path(PlannerInfo *root, Path *subpath,
+ 							   bool first_call,
+ 							   List **group_clauses, List **group_exprs,
+ 							   List **agg_exprs, double input_rows)
+ {
+ 	RelOptInfo	*rel;
+ 	AggClauseCosts  agg_costs;
+ 	double	dNumGroups;
+ 	AggPath	*result = NULL;
+ 
+ 	rel = subpath->parent;
+ 	Assert(rel->gpi != NULL);
+ 
+ 	if (subpath->pathkeys == NIL)
+ 		return NULL;
+ 
+ 	if (!grouping_is_sortable(root->parse->groupClause))
+ 		return NULL;
+ 
+ 	if (first_call)
+ 	{
+ 		ListCell	*lc1;
+ 		List	*key_subset = NIL;
+ 
+ 		/*
+ 		 * Find all query pathkeys that our relation does affect.
+ 		 */
+ 		foreach(lc1, root->group_pathkeys)
+ 		{
+ 			PathKey	*gkey = castNode(PathKey, lfirst(lc1));
+ 			ListCell	*lc2;
+ 
+ 			foreach(lc2, subpath->pathkeys)
+ 			{
+ 				PathKey	*skey = castNode(PathKey, lfirst(lc2));
+ 
+ 				if (skey == gkey)
+ 				{
+ 					key_subset = lappend(key_subset, gkey);
+ 					break;
+ 				}
+ 			}
+ 		}
+ 
+ 		if (key_subset == NIL)
+ 			return NULL;
+ 
+ 		/* Check if AGG_SORTED is useful for the whole query.  */
+ 		if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
+ 			return NULL;
+ 	}
+ 
+ 	if (first_call)
+ 		get_grouping_expressions(root, rel->gpi->target, group_clauses,
+ 								 group_exprs, agg_exprs);
+ 
+ 	MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ 	Assert(*agg_exprs != NIL);
+ 	get_agg_clause_costs(root, (Node *) *agg_exprs, AGGSPLIT_INITIAL_SERIAL,
+ 						 &agg_costs);
+ 
+ 	Assert(*group_exprs != NIL);
+ 	dNumGroups = estimate_num_groups(root, *group_exprs, input_rows, NULL);
+ 
+ 	/* TODO HAVING qual. */
+ 	Assert(*group_clauses != NIL);
+ 	result = create_agg_path(root, rel, subpath, rel->gpi->target, AGG_SORTED,
+ 							 AGGSPLIT_INITIAL_SERIAL, *group_clauses, NIL,
+ 							 &agg_costs, dNumGroups);
+ 
+ 	return result;
+ }
+ 
+ /*
+  * Appy partial AGG_HASHED aggregation to subpath.
+  *
+  * Arguments have the same meaning as those of create_agg_sorted_path.
+  *
+  */
+ AggPath *
+ create_partial_agg_hashed_path(PlannerInfo *root, Path *subpath,
+ 							   bool first_call,
+ 							   List **group_clauses, List **group_exprs,
+ 							   List **agg_exprs, double input_rows)
+ {
+ 	RelOptInfo	*rel;
+ 	bool	can_hash;
+ 	AggClauseCosts  agg_costs;
+ 	double	dNumGroups;
+ 	Size	hashaggtablesize;
+ 	Query	   *parse = root->parse;
+ 	AggPath	*result = NULL;
+ 
+ 	rel = subpath->parent;
+ 	Assert(rel->gpi != NULL);
+ 
+ 	if (first_call)
+ 	{
+ 		/*
+ 		 * Find one grouping clause per grouping column.
+ 		 *
+ 		 * All that create_agg_plan eventually needs of the clause is
+ 		 * tleSortGroupRef, so we don't have to care that the clause
+ 		 * expression might differ from texpr, in case texpr was derived from
+ 		 * EC.
+ 		 */
+ 		get_grouping_expressions(root, rel->gpi->target, group_clauses,
+ 								 group_exprs, agg_exprs);
+ 	}
+ 
+ 	MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ 	Assert(*agg_exprs != NIL);
+ 	get_agg_clause_costs(root, (Node *) *agg_exprs, AGGSPLIT_INITIAL_SERIAL,
+ 						 &agg_costs);
+ 
+ 	can_hash = (parse->groupClause != NIL &&
+ 				parse->groupingSets == NIL &&
+ 				agg_costs.numOrderedAggs == 0 &&
+ 				grouping_is_hashable(parse->groupClause));
+ 
+ 	if (can_hash)
+ 	{
+ 		Assert(*group_exprs != NIL);
+ 		dNumGroups = estimate_num_groups(root, *group_exprs, input_rows,
+ 										 NULL);
+ 
+ 		hashaggtablesize = estimate_hashagg_tablesize(subpath, &agg_costs,
+ 													  dNumGroups);
+ 
+ 		if (hashaggtablesize < work_mem * 1024L)
+ 		{
+ 			/*
+ 			 * Create the partial aggregation path.
+ 			 */
+ 			Assert(*group_clauses != NIL);
+ 
+ 			result = create_agg_path(root, rel, subpath,
+ 									 rel->gpi->target,
+ 									 AGG_HASHED,
+ 									 AGGSPLIT_INITIAL_SERIAL,
+ 									 *group_clauses, NIL,
+ 									 &agg_costs,
+ 									 dNumGroups);
+ 
+ 			/*
+ 			 * The agg path should require no fewer parameters than the plain
+ 			 * one.
+ 			 */
+ 			result->path.param_info = subpath->param_info;
+ 		}
+ 	}
+ 
+ 	return result;
+ }
+ 
+ /*
   * create_groupingsets_path
   *	  Creates a pathnode that represents performing GROUPING SETS aggregation
   *
*************** reparameterize_path(PlannerInfo *root, P
*** 3426,3428 ****
--- 3656,4081 ----
  	}
  	return NULL;
  }
+ 
+ /*
+  * reparameterize_path_by_child
+  * 		Given a path parameterized by the parent of the given relation,
+  * 		translate the path to be parameterized by the given child relation.
+  *
+  * The function creates a new path of the same type as the given path, but
+  * parameterized by the given child relation. If it can not reparameterize the
+  * path as required, it returns NULL.
+  *
+  * The cost, number of rows, width and parallel path properties depend upon
+  * path->parent, which does not change during the translation. Hence those
+  * members are copied as they are.
+  */
+ 
+ Path *
+ reparameterize_path_by_child(PlannerInfo *root, Path *path,
+ 							  RelOptInfo *child_rel)
+ {
+ 
+ #define FLAT_COPY_PATH(newnode, node, nodetype)  \
+ 	( (newnode) = makeNode(nodetype), \
+ 	  memcpy((newnode), (node), sizeof(nodetype)) )
+ 
+ 	Path	   *new_path;
+ 	ParamPathInfo   *new_ppi;
+ 	ParamPathInfo   *old_ppi;
+ 	Relids		required_outer;
+ 
+ 	/*
+ 	 * If the path is not parameterized by parent of the given relation or it it
+ 	 * doesn't need reparameterization.
+ 	 */
+ 	if (!path->param_info ||
+ 		!bms_overlap(PATH_REQ_OUTER(path), child_rel->top_parent_relids))
+ 	return path;
+ 
+ 	/*
+ 	 * Make a copy of the given path and reparameterize or translate the
+ 	 * path specific members.
+ 	 */
+ 	switch (nodeTag(path))
+ 	{
+ 		case T_Path:
+ 			FLAT_COPY_PATH(new_path, path, Path);
+ 			break;
+ 
+ 		case T_IndexPath:
+ 			{
+ 				IndexPath *ipath;
+ 
+ 				FLAT_COPY_PATH(ipath, path, IndexPath);
+ 				ipath->indexclauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 												  (Node *) ipath->indexclauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				ipath->indexquals = (List *) adjust_appendrel_attrs_multilevel(root,
+ 													(Node *) ipath->indexquals,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) ipath;
+ 			}
+ 			break;
+ 
+ 		case T_BitmapHeapPath:
+ 			{
+ 				BitmapHeapPath *bhpath;
+ 
+ 				FLAT_COPY_PATH(bhpath, path, BitmapHeapPath);
+ 				bhpath->bitmapqual = reparameterize_path_by_child(root,
+ 															bhpath->bitmapqual,
+ 																	child_rel);
+ 				new_path = (Path *) bhpath;
+ 			}
+ 			break;
+ 
+ 		case T_BitmapAndPath:
+ 			{
+ 				BitmapAndPath *bapath;
+ 				ListCell   *lc;
+ 				List	   *bitmapquals = NIL;
+ 
+ 				FLAT_COPY_PATH(bapath, path, BitmapAndPath);
+ 				foreach (lc, bapath->bitmapquals)
+ 				{
+ 					Path   *bmqpath = lfirst(lc);
+ 
+ 					bitmapquals = lappend(bitmapquals,
+ 										  reparameterize_path_by_child(root,
+ 																	   bmqpath,
+ 																   child_rel));
+ 				}
+ 				bapath->bitmapquals = bitmapquals;
+ 				new_path = (Path *) bapath;
+ 			}
+ 			break;
+ 
+ 		case T_BitmapOrPath:
+ 			{
+ 				BitmapOrPath *bopath;
+ 				ListCell   *lc;
+ 				List	   *bitmapquals = NIL;
+ 
+ 				FLAT_COPY_PATH(bopath, path, BitmapOrPath);
+ 				foreach (lc, bopath->bitmapquals)
+ 				{
+ 					Path   *bmqpath = lfirst(lc);
+ 
+ 					bitmapquals = lappend(bitmapquals,
+ 										  reparameterize_path_by_child(root,
+ 																	   bmqpath,
+ 																   child_rel));
+ 				}
+ 				bopath->bitmapquals = bitmapquals;
+ 				new_path = (Path *) bopath;
+ 			}
+ 			break;
+ 
+ 		case T_TidPath:
+ 			{
+ 				TidPath *tpath;
+ 
+ 				/*
+ 				 * TidPath contains tidquals, which do not contain any external
+ 				 * parameters per create_tidscan_path(). So don't bother to
+ 				 * translate those.
+ 				 */
+ 				FLAT_COPY_PATH(tpath, path, TidPath);
+ 				new_path = (Path *) tpath;
+ 			}
+ 			break;
+ 
+ 		case T_ForeignPath:
+ 			{
+ 				ForeignPath   *fpath;
+ 				ReparameterizeForeignPathByChild_function rfpc_func;
+ 
+ 				FLAT_COPY_PATH(fpath, path, ForeignPath);
+ 				if (fpath->fdw_outerpath)
+ 					fpath->fdw_outerpath = reparameterize_path_by_child(root,
+ 														  fpath->fdw_outerpath,
+ 																	child_rel);
+ 				rfpc_func = path->parent->fdwroutine->ReparameterizeForeignPathByChild;
+ 
+ 				/* Hand over to FDW if supported. */
+ 				if (rfpc_func)
+ 					fpath->fdw_private = rfpc_func(root, fpath->fdw_private,
+ 													child_rel);
+ 				new_path = (Path *) fpath;
+ 			}
+ 			break;
+ 
+ 		case T_CustomPath:
+ 			{
+ 				CustomPath *cpath;
+ 				ListCell   *lc;
+ 				List	   *custompaths = NIL;
+ 
+ 				FLAT_COPY_PATH(cpath, path, CustomPath);
+ 
+ 				foreach (lc, cpath->custom_paths)
+ 				{
+ 					Path   *subpath = lfirst(lc);
+ 
+ 					custompaths = lappend(custompaths,
+ 										  reparameterize_path_by_child(root,
+ 																	   subpath,
+ 																   child_rel));
+ 				}
+ 				cpath->custom_paths = custompaths;
+ 
+ 				if (cpath->methods &&
+ 					cpath->methods->ReparameterizeCustomPathByChild)
+ 					cpath->custom_private = cpath->methods->ReparameterizeCustomPathByChild(root,
+ 														 cpath->custom_private,
+ 														 child_rel);
+ 
+ 				new_path = (Path *) cpath;
+ 			}
+ 			break;
+ 
+ 		case T_NestPath:
+ 			{
+ 				JoinPath *jpath;
+ 
+ 				FLAT_COPY_PATH(jpath, path, NestPath);
+ 
+ 				jpath->outerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->outerjoinpath,
+ 														 child_rel);
+ 				jpath->innerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->innerjoinpath,
+ 														 child_rel);
+ 				jpath->joinrestrictinfo = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) jpath->joinrestrictinfo,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) jpath;
+ 			}
+ 			break;
+ 
+ 		case T_MergePath:
+ 			{
+ 				JoinPath *jpath;
+ 				MergePath  *mpath;
+ 
+ 				FLAT_COPY_PATH(mpath, path, MergePath);
+ 
+ 				jpath = (JoinPath *) mpath;
+ 				jpath->outerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->outerjoinpath,
+ 														 child_rel);
+ 				jpath->innerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->innerjoinpath,
+ 														 child_rel);
+ 				jpath->joinrestrictinfo = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) jpath->joinrestrictinfo,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				mpath->path_mergeclauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											 (Node *) mpath->path_mergeclauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) mpath;
+ 			}
+ 			break;
+ 
+ 		case T_HashPath:
+ 			{
+ 				JoinPath *jpath;
+ 				HashPath   *hpath;
+ 				FLAT_COPY_PATH(hpath, path, HashPath);
+ 
+ 				jpath = (JoinPath *) hpath;
+ 				jpath->outerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->outerjoinpath,
+ 														 child_rel);
+ 				jpath->innerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->innerjoinpath,
+ 														 child_rel);
+ 				jpath->joinrestrictinfo = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) jpath->joinrestrictinfo,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				hpath->path_hashclauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) hpath->path_hashclauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) hpath;
+ 			}
+ 			break;
+ 
+ 		case T_AppendPath:
+ 			{
+ 				AppendPath	*apath;
+ 				List		*subpaths = NIL;
+ 				ListCell	*lc;
+ 
+ 				FLAT_COPY_PATH(apath, path, AppendPath);
+ 				foreach (lc, apath->subpaths)
+ 					subpaths = lappend(subpaths,
+ 									   reparameterize_path_by_child(root,
+ 																	lfirst(lc),
+ 																	child_rel));
+ 				apath->subpaths = subpaths;
+ 				new_path = (Path *) apath;
+ 			}
+ 			break;
+ 
+ 		case T_MergeAppend:
+ 			{
+ 				MergeAppendPath	*mapath;
+ 				List		*subpaths = NIL;
+ 				ListCell	*lc;
+ 
+ 				FLAT_COPY_PATH(mapath, path, MergeAppendPath);
+ 				foreach (lc, mapath->subpaths)
+ 					subpaths = lappend(subpaths,
+ 									   reparameterize_path_by_child(root,
+ 																	lfirst(lc),
+ 																	child_rel));
+ 				mapath->subpaths = subpaths;
+ 				new_path = (Path *) mapath;
+ 			}
+ 			break;
+ 
+ 		case T_MaterialPath:
+ 			{
+ 				MaterialPath *mpath;
+ 
+ 				FLAT_COPY_PATH(mpath, path, MaterialPath);
+ 				mpath->subpath = reparameterize_path_by_child(root,
+ 															  mpath->subpath,
+ 															  child_rel);
+ 				new_path = (Path *) mpath;
+ 			}
+ 			break;
+ 
+ 		case T_UniquePath:
+ 			{
+ 				UniquePath *upath;
+ 
+ 				FLAT_COPY_PATH(upath, path, UniquePath);
+ 				upath->subpath = reparameterize_path_by_child(root,
+ 															  upath->subpath,
+ 															  child_rel);
+ 				upath->uniq_exprs = (List *) adjust_appendrel_attrs_multilevel(root,
+ 													(Node *) upath->uniq_exprs,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) upath;
+ 			}
+ 			break;
+ 
+ 		case T_GatherPath:
+ 			{
+ 				GatherPath *gpath;
+ 
+ 				FLAT_COPY_PATH(gpath, path, GatherPath);
+ 				gpath->subpath = reparameterize_path_by_child(root,
+ 															  gpath->subpath,
+ 															  child_rel);
+ 				new_path = (Path *) gpath;
+ 			}
+ 			break;
+ 
+ 		case T_GatherMergePath:
+ 			{
+ 				GatherMergePath *gmpath;
+ 
+ 				FLAT_COPY_PATH(gmpath, path, GatherMergePath);
+ 				gmpath->subpath = reparameterize_path_by_child(root,
+ 															   gmpath->subpath,
+ 															   child_rel);
+ 				new_path = (Path *) gmpath;
+ 			}
+ 			break;
+ 
+ 		case T_SubqueryScanPath:
+ 			/*
+ 			 * Subqueries can't be partitioned right now, so a subquery can not
+ 			 * participate in a partition-wise join and hence can not be seen
+ 			 * here.
+ 			 */
+ 		case T_ResultPath:
+ 			/*
+ 			 * A result path can not have any parameterization, so we
+ 			 * should never see it here.
+ 			 */
+ 		default:
+ 			/* Other kinds of paths can not appear in a join tree. */
+ 			elog(ERROR, "unrecognized path node type %d", (int) nodeTag(path));
+ 
+ 			/* Keep compiler quite about unassigned new_path */
+ 			return NULL;
+ 	}
+ 
+ 	/*
+ 	 * Adjust the parameterization information, which refers to the topmost
+ 	 * parent. The topmost parent can be multiple levels away from the given
+ 	 * child, hence use multi-level expression adjustment routines.
+ 	 */
+ 	old_ppi = new_path->param_info;
+ 	required_outer = adjust_child_relids_multilevel(root,
+ 													old_ppi->ppi_req_outer,
+ 													child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 
+ 	/* If we already have a PPI for this parameterization, just return it */
+ 	new_ppi = find_param_path_info(new_path->parent, required_outer);
+ 
+ 	/*
+ 	 * If not, build a new one and link it to the list of PPIs. When called
+ 	 * during GEQO join planning, we are in a short-lived memory context.  We
+ 	 * must make sure that the new PPI and its contents attached to a baserel
+ 	 * survives the GEQO cycle, else the baserel is trashed for future GEQO
+ 	 * cycles.  On the other hand, when we are adding new PPI to a joinrel
+ 	 * during GEQO, we don't want that to clutter the main planning context.
+ 	 * Upshot is that the best solution is to explicitly allocate new PPI in
+ 	 * the same context the given RelOptInfo is in.
+ 	 */
+ 	if (!new_ppi)
+ 	{
+ 		MemoryContext oldcontext;
+ 		RelOptInfo   *rel = path->parent;
+ 
+ 		oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+ 
+ 		new_ppi = makeNode(ParamPathInfo);
+ 		new_ppi->ppi_req_outer = bms_copy(required_outer);
+ 		new_ppi->ppi_rows = old_ppi->ppi_rows;
+ 		new_ppi->ppi_clauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 												 (Node *) old_ppi->ppi_clauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 		rel->ppilist = lappend(rel->ppilist, new_ppi);
+ 
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	bms_free(required_outer);
+ 
+ 	new_path->param_info = new_ppi;
+ 
+ 	/*
+ 	 * Adjust the path target if the parent of the outer relation is referenced
+ 	 * in the targetlist. This can happen when only the parent of outer relation is
+ 	 * laterally referenced in this relation.
+ 	 */
+ 	if (bms_overlap(path->parent->lateral_relids, child_rel->top_parent_relids))
+ 	{
+ 		List	   *exprs;
+ 
+ 		new_path->pathtarget = copy_pathtarget(new_path->pathtarget);
+ 		exprs = new_path->pathtarget->exprs;
+ 		exprs = (List *) adjust_appendrel_attrs_multilevel(root,
+ 														   (Node *) exprs,
+ 														   child_rel->relids,
+ 											   child_rel->top_parent_relids);
+ 		new_path->pathtarget->exprs = exprs;
+ 	}
+ 
+ 	return new_path;
+ }
diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c
new file mode 100644
index 698a387..6714288
*** a/src/backend/optimizer/util/placeholder.c
--- b/src/backend/optimizer/util/placeholder.c
***************
*** 20,25 ****
--- 20,26 ----
  #include "optimizer/pathnode.h"
  #include "optimizer/placeholder.h"
  #include "optimizer/planmain.h"
+ #include "optimizer/prep.h"
  #include "optimizer/var.h"
  #include "utils/lsyscache.h"
  
*************** add_placeholders_to_joinrel(PlannerInfo
*** 414,419 ****
--- 415,424 ----
  	Relids		relids = joinrel->relids;
  	ListCell   *lc;
  
+ 	/* This function is called only on the parent relations. */
+ 	Assert(!IS_OTHER_REL(joinrel) && !IS_OTHER_REL(outer_rel) &&
+ 		   !IS_OTHER_REL(inner_rel));
+ 
  	foreach(lc, root->placeholder_list)
  	{
  		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
*************** add_placeholders_to_joinrel(PlannerInfo
*** 459,461 ****
--- 464,518 ----
  		}
  	}
  }
+ 
+ /*
+  * add_placeholders_to_child_joinrel
+  *		Translate the PHVs in parent's targetlist and add them to the child's
+  *		targetlist. Also adjust the cost
+  */
+ void
+ add_placeholders_to_child_joinrel(PlannerInfo *root, RelOptInfo *childrel,
+ 								  RelOptInfo *parentrel)
+ {
+ 	ListCell  *lc;
+ 	AppendRelInfo **appinfos;
+ 	int		nappinfos;
+ 
+ 
+ 	Assert(IS_JOIN_REL(childrel) && IS_JOIN_REL(parentrel));
+ 
+ 	/* Ensure child relations is really what it claims to be. */
+ 	Assert(IS_OTHER_REL(childrel));
+ 
+ 	appinfos = find_appinfos_by_relids(root, childrel->relids, &nappinfos);
+ 	foreach (lc, parentrel->reltarget->exprs)
+ 	{
+ 		PlaceHolderVar *phv = lfirst(lc);
+ 
+ 		if (IsA(phv, PlaceHolderVar))
+ 		{
+ 			/*
+ 			 * In case the placeholder Var refers to any of the parent
+ 			 * relations, translate it to refer to the corresponding child.
+ 			 */
+ 			if (bms_overlap(phv->phrels, parentrel->relids) &&
+ 				childrel->reloptkind == RELOPT_OTHER_JOINREL)
+ 			{
+ 				phv = (PlaceHolderVar *) adjust_appendrel_attrs(root,
+ 															  (Node *) phv,
+ 																 nappinfos,
+ 																 appinfos);
+ 			}
+ 
+ 			childrel->reltarget->exprs = lappend(childrel->reltarget->exprs,
+ 												 phv);
+ 		}
+ 	}
+ 
+ 	/* Adjust the cost and width of child targetlist. */
+ 	childrel->reltarget->cost.startup = parentrel->reltarget->cost.startup;
+ 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
+ 	childrel->reltarget->width = parentrel->reltarget->width;
+ 
+ 	pfree(appinfos);
+ }
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
new file mode 100644
index 9207c8d..7e846e1
*** a/src/backend/optimizer/util/plancat.c
--- b/src/backend/optimizer/util/plancat.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/catalog.h"
  #include "catalog/dependency.h"
  #include "catalog/heap.h"
+ #include "catalog/pg_inherits_fn.h"
  #include "catalog/partition.h"
  #include "catalog/pg_am.h"
  #include "catalog/pg_statistic_ext.h"
*************** static List *get_relation_constraints(Pl
*** 68,73 ****
--- 69,80 ----
  static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
  				  Relation heapRelation);
  static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
+ static List **build_baserel_partition_key_exprs(Relation relation,
+ 												Index varno);
+ static PartitionScheme find_partition_scheme(struct PlannerInfo *root,
+ 											 Relation rel);
+ static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ 							Relation relation);
  
  /*
   * get_relation_info -
*************** get_relation_info(PlannerInfo *root, Oid
*** 420,425 ****
--- 427,436 ----
  	/* Collect info about relation's foreign keys, if relevant */
  	get_relation_foreign_keys(root, rel, relation, inhparent);
  
+ 	/* Collect info about relation's partitioning scheme, if any. */
+ 	if (inhparent)
+ 		get_relation_partition_info(root, rel, relation);
+ 
  	heap_close(relation, NoLock);
  
  	/*
*************** has_row_triggers(PlannerInfo *root, Inde
*** 1801,1803 ****
--- 1812,1975 ----
  	heap_close(relation, NoLock);
  	return result;
  }
+ 
+ /*
+  * get_relation_partition_info
+  *
+  * Retrieves partitioning information for a given relation.
+  *
+  * Partitioning scheme, partition key expressions and OIDs of partitions are
+  * added to the given RelOptInfo. A partitioned table can participate in the
+  * query as a simple relation or an inheritance parent. Only the later can have
+  * child relations, and hence partitions. From the point of view of the query
+  * optimizer only such relations are considered to be partitioned. Hence
+  * partitioning information is set only for an inheritance parent.
+  */
+ static void
+ get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ 							Relation relation)
+ {
+ 	PartitionDesc	part_desc = RelationGetPartitionDesc(relation);
+ 
+ 	/* No partitioning information for an unpartitioned relation. */
+ 	if (relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE ||
+ 		!(rel->part_scheme = find_partition_scheme(root, relation)))
+ 		return;
+ 
+ 	Assert(part_desc);
+ 	rel->nparts = part_desc->nparts;
+ 	rel->boundinfo = part_desc->boundinfo;
+ 	rel->partexprs = build_baserel_partition_key_exprs(relation, rel->relid);
+ 	rel->part_oids = part_desc->oids;
+ 
+ 	Assert(rel->nparts > 0 && rel->boundinfo && rel->part_oids);
+ 	return;
+ }
+ 
+ /*
+  * find_partition_scheme
+  *
+  * The function returns a canonical partition scheme which exactly matches the
+  * partitioning properties of the given relation if one exists in the of
+  * canonical partitioning schemes maintained in PlannerInfo. If none of the
+  * existing partitioning schemes match, the function creates a canonical
+  * partition scheme and adds it to the list.
+  *
+  * For an unpartitioned table or for a multi-level partitioned table it returns
+  * NULL. See comments in the function for more details.
+  */
+ static PartitionScheme
+ find_partition_scheme(PlannerInfo *root, Relation relation)
+ {
+ 	PartitionKey	part_key = RelationGetPartitionKey(relation);
+ 	ListCell	   *lc;
+ 	int		partnatts;
+ 	PartitionScheme	part_scheme = NULL;
+ 
+ 	/* No partition scheme for an unpartitioned relation. */
+ 	if (!part_key)
+ 		return NULL;
+ 
+ 	partnatts = part_key->partnatts;
+ 
+ 	/* Search for a matching partition scheme and return if found one. */
+ 	foreach (lc, root->part_schemes)
+ 	{
+ 		part_scheme = lfirst(lc);
+ 
+ 		/* Match partitioning strategy and number of keys. */
+ 		if (part_key->strategy != part_scheme->strategy ||
+ 			partnatts != part_scheme->partnatts)
+ 			continue;
+ 
+ 		/* Match the partition key types. */
+ 		if (memcmp(part_key->partopfamily, part_scheme->partopfamily,
+ 				   sizeof(Oid) * partnatts) != 0 ||
+ 			memcmp(part_key->partopcintype, part_scheme->partopcintype,
+ 				   sizeof(Oid) * partnatts) != 0 ||
+ 			memcmp(part_key->parttypcoll, part_scheme->parttypcoll,
+ 				   sizeof(Oid) * partnatts) != 0)
+ 			continue;
+ 
+ 		/* Found matching partition scheme. */
+ 		return part_scheme;
+ 	}
+ 
+ 	/* Did not find matching partition scheme. Create one. */
+ 	part_scheme = (PartitionScheme) palloc0(sizeof(PartitionSchemeData));
+ 
+ 	part_scheme->strategy = part_key->strategy;
+ 	/* Store partition key information. */
+ 	part_scheme->partnatts = part_key->partnatts;
+ 	part_scheme->partopfamily = part_key->partopfamily;
+ 	part_scheme->partopcintype = part_key->partopcintype;
+ 	part_scheme->parttypcoll = part_key->parttypcoll;
+ 	part_scheme->partsupfunc = part_key->partsupfunc;
+ 
+ 	/* Add the partitioning scheme to PlannerInfo. */
+ 	root->part_schemes = lappend(root->part_schemes, part_scheme);
+ 
+ 	return part_scheme;
+ }
+ 
+ /*
+  * build_baserel_partition_key_exprs
+  *
+  * Collect partition key expressions for a given base relation. The function
+  * converts any single column partition keys into corresponding Var nodes. It
+  * restamps Var nodes in partition key expressions by given varno. The
+  * partition key expressions are returned as an array of single element lists
+  * to be stored in RelOptInfo of the base relation.
+  */
+ static List **
+ build_baserel_partition_key_exprs(Relation relation, Index varno)
+ {
+ 	PartitionKey	part_key = RelationGetPartitionKey(relation);
+ 	int		num_pkexprs;
+ 	int		cnt_pke;
+ 	List	  **partexprs;
+ 	ListCell   *lc;
+ 
+ 	if (!part_key || part_key->partnatts <= 0)
+ 		return NULL;
+ 
+ 	num_pkexprs = part_key->partnatts;
+ 	partexprs = (List **) palloc(sizeof(List *) * num_pkexprs);
+ 	lc = list_head(part_key->partexprs);
+ 
+ 	for (cnt_pke = 0; cnt_pke < num_pkexprs; cnt_pke++)
+ 	{
+ 		AttrNumber attno = part_key->partattrs[cnt_pke];
+ 		Expr	  *pkexpr;
+ 
+ 		if (attno != InvalidAttrNumber)
+ 		{
+ 			/* Single column partition key is stored as a Var node. */
+ 			Form_pg_attribute att_tup;
+ 
+ 			if (attno < 0)
+ 				att_tup = SystemAttributeDefinition(attno,
+ 												 relation->rd_rel->relhasoids);
+ 			else
+ 				att_tup = relation->rd_att->attrs[attno - 1];
+ 
+ 			pkexpr = (Expr *) makeVar(varno, attno, att_tup->atttypid,
+ 									  att_tup->atttypmod,
+ 									  att_tup->attcollation, 0);
+ 		}
+ 		else
+ 		{
+ 			if (lc == NULL)
+ 				elog(ERROR, "wrong number of partition key expressions");
+ 
+ 			/* Re-stamp the expression with given varno. */
+ 			pkexpr = (Expr *) copyObject(lfirst(lc));
+ 			ChangeVarNodes((Node *) pkexpr, 1, varno, 0);
+ 			lc = lnext(lc);
+ 		}
+ 
+ 		partexprs[cnt_pke] = list_make1(pkexpr);
+ 	}
+ 
+ 	return partexprs;
+ }
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
new file mode 100644
index 342d884..308bdec
*** a/src/backend/optimizer/util/relnode.c
--- b/src/backend/optimizer/util/relnode.c
***************
*** 23,30 ****
--- 23,32 ----
  #include "optimizer/paths.h"
  #include "optimizer/placeholder.h"
  #include "optimizer/plancat.h"
+ #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
  #include "optimizer/tlist.h"
+ #include "optimizer/var.h"
  #include "utils/hsearch.h"
  
  
*************** typedef struct JoinHashEntry
*** 35,41 ****
  } JoinHashEntry;
  
  static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *input_rel);
  static List *build_joinrel_restrictlist(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outer_rel,
--- 37,43 ----
  } JoinHashEntry;
  
  static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 								RelOptInfo *input_rel, bool grouped);
  static List *build_joinrel_restrictlist(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outer_rel,
*************** static List *subbuild_joinrel_joinlist(R
*** 52,57 ****
--- 54,64 ----
  static void set_foreign_rel_properties(RelOptInfo *joinrel,
  						   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
  static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
+ extern ParamPathInfo *find_param_path_info(RelOptInfo *rel,
+ 									  Relids required_outer);
+ static void build_joinrel_partition_info(RelOptInfo *joinrel,
+ 							 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+ 							 List *restrictlist, JoinType jointype);
  
  
  /*
*************** build_simple_rel(PlannerInfo *root, int
*** 120,125 ****
--- 127,133 ----
  	rel->cheapest_parameterized_paths = NIL;
  	rel->direct_lateral_relids = NULL;
  	rel->lateral_relids = NULL;
+ 	rel->gpi = NULL;
  	rel->relid = relid;
  	rel->rtekind = rte->rtekind;
  	/* min_attr, max_attr, attr_needed, attr_widths are set below */
*************** build_simple_rel(PlannerInfo *root, int
*** 146,151 ****
--- 154,164 ----
  	rel->baserestrict_min_security = UINT_MAX;
  	rel->joininfo = NIL;
  	rel->has_eclass_joins = false;
+ 	rel->part_scheme = NULL;
+ 	rel->nparts = 0;
+ 	rel->boundinfo = NULL;
+ 	rel->partexprs = NULL;
+ 	rel->part_rels = NULL;
  
  	/*
  	 * Pass top parent's relids down the inheritance hierarchy. If the parent
*************** build_simple_rel(PlannerInfo *root, int
*** 218,237 ****
  	if (rte->inh)
  	{
  		ListCell   *l;
  
  		foreach(l, root->append_rel_list)
  		{
  			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
  
  			/* append_rel_list contains all append rels; ignore others */
  			if (appinfo->parent_relid != relid)
  				continue;
  
! 			(void) build_simple_rel(root, appinfo->child_relid,
! 									rel);
  		}
  	}
  
  	return rel;
  }
  
--- 231,293 ----
  	if (rte->inh)
  	{
  		ListCell   *l;
+ 		int			nparts = rel->nparts;
+ 
+ 		if (nparts > 0)
+ 			rel->part_rels = (RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
  
  		foreach(l, root->append_rel_list)
  		{
  			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
+ 			RelOptInfo *childrel;
+ 			int			cnt_parts;
+ 			RangeTblEntry *childRTE;
  
  			/* append_rel_list contains all append rels; ignore others */
  			if (appinfo->parent_relid != relid)
  				continue;
  
! 			childrel = build_simple_rel(root, appinfo->child_relid,
! 										 rel);
! 
! 			/* Nothing more to do for an unpartitioned table. */
! 			if (!rel->part_scheme)
! 				continue;
! 
! 			childRTE = root->simple_rte_array[appinfo->child_relid];
! 			/*
! 			 * Two partitioned tables with the same partitioning scheme, have
! 			 * their partition bounds arranged in the same order. The order of
! 			 * partition OIDs in RelOptInfo corresponds to the partition bound
! 			 * order. Thus the OIDs of matching partitions from both the tables
! 			 * are placed at the same position in the array of partition OIDs
! 			 * in the respective RelOptInfos. Arranging RelOptInfos of
! 			 * partitions in the same order as their OIDs makes it easy to find
! 			 * the RelOptInfos of matching partitions for partition-wise join.
! 			 */
! 			for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
! 			{
! 				if (rel->part_oids[cnt_parts] == childRTE->relid)
! 				{
! 					Assert(!rel->part_rels[cnt_parts]);
! 					rel->part_rels[cnt_parts] = childrel;
! 					break;
! 				}
! 			}
  		}
  	}
  
+ 	/* Should have found all the childrels of a partitioned relation. */
+ 	if (rel->part_scheme)
+ 	{
+ 		int		cnt_parts;
+ 
+ 		for (cnt_parts = 0; cnt_parts < rel->nparts; cnt_parts++)
+ 			if (!rel->part_rels[cnt_parts])
+ 				elog(ERROR, "could not find the RelOptInfo of a partition with oid %u",
+ 					 rel->part_oids[cnt_parts]);
+ 	}
+ 
  	return rel;
  }
  
*************** build_join_rel(PlannerInfo *root,
*** 453,458 ****
--- 509,517 ----
  	RelOptInfo *joinrel;
  	List	   *restrictlist;
  
+ 	/* This function should be used only for join between parents. */
+ 	Assert(!IS_OTHER_REL(outer_rel) && !IS_OTHER_REL(inner_rel));
+ 
  	/*
  	 * See if we already have a joinrel for this set of base rels.
  	 */
*************** build_join_rel(PlannerInfo *root,
*** 497,502 ****
--- 556,562 ----
  				  inner_rel->direct_lateral_relids);
  	joinrel->lateral_relids = min_join_parameterization(root, joinrel->relids,
  														outer_rel, inner_rel);
+ 	joinrel->gpi = NULL;
  	joinrel->relid = 0;			/* indicates not a baserel */
  	joinrel->rtekind = RTE_JOIN;
  	joinrel->min_attr = 0;
*************** build_join_rel(PlannerInfo *root,
*** 527,532 ****
--- 587,597 ----
  	joinrel->joininfo = NIL;
  	joinrel->has_eclass_joins = false;
  	joinrel->top_parent_relids = NULL;
+ 	joinrel->part_scheme = NULL;
+ 	joinrel->nparts = 0;
+ 	joinrel->boundinfo = NULL;
+ 	joinrel->partexprs = NULL;
+ 	joinrel->part_rels = NULL;
  
  	/* Compute information relevant to the foreign relations. */
  	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
*************** build_join_rel(PlannerInfo *root,
*** 539,548 ****
  	 * and inner rels we first try to build it from.  But the contents should
  	 * be the same regardless.
  	 */
! 	build_joinrel_tlist(root, joinrel, outer_rel);
! 	build_joinrel_tlist(root, joinrel, inner_rel);
  	add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
  
  	/*
  	 * add_placeholders_to_joinrel also took care of adding the ph_lateral
  	 * sets of any PlaceHolderVars computed here to direct_lateral_relids, so
--- 604,620 ----
  	 * and inner rels we first try to build it from.  But the contents should
  	 * be the same regardless.
  	 */
! 	build_joinrel_tlist(root, joinrel, outer_rel, false);
! 	build_joinrel_tlist(root, joinrel, inner_rel, false);
  	add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
  
+ 	/* Try to build grouped target. */
+ 	/*
+ 	 * TODO Consider if placeholders make sense here. If not, also make the
+ 	 * related code below conditional.
+ 	 */
+ 	prepare_rel_for_grouping(root, joinrel);
+ 
  	/*
  	 * add_placeholders_to_joinrel also took care of adding the ph_lateral
  	 * sets of any PlaceHolderVars computed here to direct_lateral_relids, so
*************** build_join_rel(PlannerInfo *root,
*** 572,577 ****
--- 644,653 ----
  	 */
  	joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
  
+ 	/* Store the partition information. */
+ 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
+ 								 sjinfo->jointype);
+ 
  	/*
  	 * Set estimates of the joinrel's size.
  	 */
*************** build_join_rel(PlannerInfo *root,
*** 617,622 ****
--- 693,845 ----
  	return joinrel;
  }
  
+  /*
+  * build_child_join_rel
+  *		Builds RelOptInfo for joining given two child relations from RelOptInfo
+  *		representing the join between their parents.
+  *
+  * 'outer_rel' and 'inner_rel' are the RelOptInfos of child relations being
+  *		joined.
+  * 'parent_joinrel' is the RelOptInfo representing the join between parent
+  *		relations. Most of the members of new RelOptInfo are produced by
+  *		translating corresponding members of this RelOptInfo.
+  * 'sjinfo': context info for child join
+  * 'restrictlist': list of RestrictInfo nodes that apply to this particular
+  *		pair of joinable relations.
+  * 'join_appinfos': list of AppendRelInfo nodes for base child relations involved
+  *		in this join.
+  */
+ RelOptInfo *
+ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
+ 					 RelOptInfo *inner_rel, RelOptInfo *parent_joinrel,
+ 					 List *restrictlist, SpecialJoinInfo *sjinfo,
+ 					 JoinType jointype)
+ {
+ 	RelOptInfo *joinrel = makeNode(RelOptInfo);
+ 	AppendRelInfo **appinfos;
+ 	int		nappinfos;
+ 
+ 	/* Only joins between other relations land here. */
+ 	Assert(IS_OTHER_REL(outer_rel) && IS_OTHER_REL(inner_rel));
+ 
+ 	joinrel->reloptkind = RELOPT_OTHER_JOINREL;
+ 	joinrel->relids = bms_union(outer_rel->relids, inner_rel->relids);
+ 	joinrel->rows = 0;
+ 	/* cheap startup cost is interesting iff not all tuples to be retrieved */
+ 	joinrel->consider_startup = (root->tuple_fraction > 0);
+ 	joinrel->consider_param_startup = false;
+ 	joinrel->consider_parallel = false;
+ 	joinrel->reltarget = create_empty_pathtarget();
+ 	joinrel->pathlist = NIL;
+ 	joinrel->ppilist = NIL;
+ 	joinrel->partial_pathlist = NIL;
+ 	joinrel->cheapest_startup_path = NULL;
+ 	joinrel->cheapest_total_path = NULL;
+ 	joinrel->cheapest_unique_path = NULL;
+ 	joinrel->cheapest_parameterized_paths = NIL;
+ 	joinrel->direct_lateral_relids = NULL;
+ 	joinrel->lateral_relids = NULL;
+ 	joinrel->gpi = makeNode(GroupedPathInfo);
+ 	if (parent_joinrel->gpi)
+ 		/*
+ 		 * Translation into child varnos will take place along with other
+ 		 * translations, see try_partition_wise_join.
+ 		 */
+ 		joinrel->gpi->target = copy_pathtarget(parent_joinrel->gpi->target);
+ 	joinrel->relid = 0;			/* indicates not a baserel */
+ 	joinrel->rtekind = RTE_JOIN;
+ 	joinrel->min_attr = 0;
+ 	joinrel->max_attr = 0;
+ 	joinrel->attr_needed = NULL;
+ 	joinrel->attr_widths = NULL;
+ 	joinrel->lateral_vars = NIL;
+ 	joinrel->lateral_referencers = NULL;
+ 	joinrel->indexlist = NIL;
+ 	joinrel->pages = 0;
+ 	joinrel->tuples = 0;
+ 	joinrel->allvisfrac = 0;
+ 	joinrel->subroot = NULL;
+ 	joinrel->subplan_params = NIL;
+ 	joinrel->serverid = InvalidOid;
+ 	joinrel->userid = InvalidOid;
+ 	joinrel->useridiscurrent = false;
+ 	joinrel->fdwroutine = NULL;
+ 	joinrel->fdw_private = NULL;
+ 	joinrel->baserestrictinfo = NIL;
+ 	joinrel->baserestrictcost.startup = 0;
+ 	joinrel->baserestrictcost.per_tuple = 0;
+ 	joinrel->joininfo = NIL;
+ 	joinrel->has_eclass_joins = false;
+ 	joinrel->top_parent_relids = NULL;
+ 	joinrel->part_scheme = NULL;
+ 	joinrel->part_rels = NULL;
+ 	joinrel->partexprs = NULL;
+ 
+ 	joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
+ 										   inner_rel->top_parent_relids);
+ 
+ 	/* Compute information relevant to foreign relations. */
+ 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
+ 
+ 	/* Build targetlist */
+ 	build_joinrel_tlist(root, joinrel, outer_rel, false);
+ 	build_joinrel_tlist(root, joinrel, inner_rel, false);
+ 	/* Add placeholder variables. */
+ 	add_placeholders_to_child_joinrel(root, joinrel, parent_joinrel);
+ 
+ 	/* Try to build grouped target. */
+ 	/*
+ 	 * TODO Consider if placeholders make sense here. If not, also make the
+ 	 * related code below conditional.
+ 	 */
+ 	prepare_rel_for_grouping(root, joinrel);
+ 
+ 
+ 	/* Construct joininfo list. */
+ 	appinfos = find_appinfos_by_relids(root, joinrel->relids, &nappinfos);
+ 	joinrel->joininfo = (List *) adjust_appendrel_attrs(root,
+ 											 (Node *) parent_joinrel->joininfo,
+ 																	 nappinfos,
+ 																	 appinfos);
+ 	pfree(appinfos);
+ 
+ 	/*
+ 	 * Lateral relids referred in child join will be same as that referred in
+ 	 * the parent relation. Throw any partial result computed while building
+ 	 * the targetlist.
+ 	 */
+ 	bms_free(joinrel->direct_lateral_relids);
+ 	bms_free(joinrel->lateral_relids);
+ 	joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
+ 	joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
+ 
+ 	/*
+ 	 * If the parent joinrel has pending equivalence classes, so does the
+ 	 * child.
+ 	 */
+ 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
+ 
+ 	/* Is the join between partitions itself partitioned? */
+ 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
+ 								 jointype);
+ 
+ 	/* Child joinrel is parallel safe if parent is parallel safe. */
+ 	joinrel->consider_parallel = parent_joinrel->consider_parallel;
+ 
+ 
+ 	/* Set estimates of the child-joinrel's size. */
+ 	set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
+ 							   sjinfo, restrictlist);
+ 
+ 	/* We build the join only once. */
+ 	Assert(!find_join_rel(root, joinrel->relids));
+ 
+ 	/* Add the relation to the PlannerInfo. */
+ 	add_join_rel(root, joinrel);
+ 
+ 	return joinrel;
+ }
+ 
  /*
   * min_join_parameterization
   *
*************** min_join_parameterization(PlannerInfo *r
*** 670,679 ****
   */
  static void
  build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *input_rel)
  {
! 	Relids		relids = joinrel->relids;
  	ListCell   *vars;
  
  	foreach(vars, input_rel->reltarget->exprs)
  	{
--- 893,932 ----
   */
  static void
  build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *input_rel, bool grouped)
  {
! 	Relids		relids;
! 	PathTarget  *input_target, *result;
  	ListCell   *vars;
+ 	int			i = -1;
+ 
+ 	/* attrs_needed refers to parent relids and not those of a child. */
+ 	if (joinrel->top_parent_relids)
+ 		relids = joinrel->top_parent_relids;
+ 	else
+ 		relids = joinrel->relids;
+ 
+  	if (!grouped)
+  	{
+  		input_target = input_rel->reltarget;
+  		result = joinrel->reltarget;
+  	}
+  	else
+  	{
+  		if (input_rel->gpi != NULL)
+  		{
+  			input_target = input_rel->gpi->target;
+  			Assert(input_target != NULL);
+  		}
+  		else
+  			input_target = input_rel->reltarget;
+ 
+  		/* Caller should have initialized this. */
+  		Assert(joinrel->gpi != NULL);
+ 
+  		/* Default to the plain target. */
+  		result = joinrel->gpi->target;
+  	}
  
  	foreach(vars, input_rel->reltarget->exprs)
  	{
*************** build_joinrel_tlist(PlannerInfo *root, R
*** 690,713 ****
  
  		/*
  		 * Otherwise, anything in a baserel or joinrel targetlist ought to be
! 		 * a Var.  (More general cases can only appear in appendrel child
! 		 * rels, which will never be seen here.)
  		 */
! 		if (!IsA(var, Var))
  			elog(ERROR, "unexpected node type in rel targetlist: %d",
  				 (int) nodeTag(var));
  
- 		/* Get the Var's original base rel */
- 		baserel = find_base_rel(root, var->varno);
- 
- 		/* Is it still needed above this joinrel? */
- 		ndx = var->varattno - baserel->min_attr;
  		if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
  		{
  			/* Yup, add it to the output */
! 			joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, var);
! 			/* Vars have cost zero, so no need to adjust reltarget->cost */
! 			joinrel->reltarget->width += baserel->attr_widths[ndx];
  		}
  	}
  }
--- 943,1009 ----
  
  		/*
  		 * Otherwise, anything in a baserel or joinrel targetlist ought to be
! 		 * a Var or ConvertRowtypeExpr introduced while translating parent
! 		 * targetlist to that of the child.
  		 */
! 		if (IsA(var, Var))
! 		{
! 			/* Get the Var's original base rel */
! 			baserel = find_base_rel(root, var->varno);
! 
! 			/* Is it still needed above this joinrel? */
! 			ndx = var->varattno - baserel->min_attr;
! 		}
! 		else if (IsA(var, ConvertRowtypeExpr))
! 		{
! 			ConvertRowtypeExpr *child_expr = (ConvertRowtypeExpr *) var;
! 			Var	 *childvar = (Var *) child_expr->arg;
! 
! 			/*
! 			 * Child's whole-row references are converted to that of parent
! 			 * using ConvertRowtypeExpr. There can be as many
! 			 * ConvertRowtypeExpr decorations as the depth of partition tree.
! 			 * The argument to deepest ConvertRowtypeExpr is expected to be a
! 			 * whole-row reference of the child.
! 			 */
! 			while (IsA(childvar, ConvertRowtypeExpr))
! 			{
! 				child_expr = (ConvertRowtypeExpr *) childvar;
! 				childvar = (Var *) child_expr->arg;
! 			}
! 			Assert(IsA(childvar, Var) && childvar->varattno == 0);
! 
! 			baserel = find_base_rel(root, childvar->varno);
! 			ndx = 0 - baserel->min_attr;
! 		}
! 		else
  			elog(ERROR, "unexpected node type in rel targetlist: %d",
  				 (int) nodeTag(var));
  
  		if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
  		{
+ 			Index sortgroupref = 0;
+ 
  			/* Yup, add it to the output */
! 			if (input_target->sortgrouprefs)
! 				sortgroupref = input_target->sortgrouprefs[i];
! 
! 			/*
! 			 * Even if not used for grouping in the input path (the input path
! 			 * is not necessarily grouped), it might be useful for grouping
! 			 * higher in the join tree.
! 			 */
! 			if (sortgroupref == 0)
! 				sortgroupref = get_expr_sortgroupref(root, (Expr *) var);
! 
! 			add_column_to_pathtarget(result, (Expr *) var, sortgroupref);
! 
! 			/*
! 			 * Vars have cost zero, so no need to adjust reltarget->cost. Even
! 			 * if, it's a ConvertRowtypeExpr, it will be computed only for the
! 			 * base relation, costing nothing for a join.
! 			 */
! 			result->width += baserel->attr_widths[ndx];
  		}
  	}
  }
*************** subbuild_joinrel_joinlist(RelOptInfo *jo
*** 843,848 ****
--- 1139,1147 ----
  {
  	ListCell   *l;
  
+ 	/* Expected to be called only for join between parent relations. */
+ 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
+ 
  	foreach(l, joininfo_list)
  	{
  		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
*************** get_baserel_parampathinfo(PlannerInfo *r
*** 1048,1059 ****
  	Assert(!bms_overlap(baserel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	foreach(lc, baserel->ppilist)
! 	{
! 		ppi = (ParamPathInfo *) lfirst(lc);
! 		if (bms_equal(ppi->ppi_req_outer, required_outer))
! 			return ppi;
! 	}
  
  	/*
  	 * Identify all joinclauses that are movable to this base rel given this
--- 1347,1354 ----
  	Assert(!bms_overlap(baserel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	if ((ppi = find_param_path_info(baserel, required_outer)))
! 		return ppi;
  
  	/*
  	 * Identify all joinclauses that are movable to this base rel given this
*************** get_baserel_parampathinfo(PlannerInfo *r
*** 1095,1100 ****
--- 1390,1545 ----
  }
  
  /*
+  * If the relation can produce grouped paths, create GroupedPathInfo for it
+  * and create target for the grouped paths.
+  */
+ void
+ prepare_rel_for_grouping(PlannerInfo *root, RelOptInfo *rel)
+ {
+ 	List	*rel_aggregates;
+ 	Relids	rel_agg_attrs = NULL;
+ 	List	*rel_agg_vars = NIL;
+ 	bool	found_higher;
+ 	ListCell	*lc;
+ 	PathTarget	*target_grouped;
+ 
+ 	if (rel->relid > 0)
+ 	{
+ 		RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+ 
+ 		/*
+ 		 * rtekind != RTE_RELATION case is not supported yet.
+ 		 */
+ 		if (rte->rtekind != RTE_RELATION)
+ 			return;
+ 	}
+ 
+ 	/* Caller should only pass base relations or joins. */
+ 	Assert(rel->reloptkind == RELOPT_BASEREL ||
+ 		   rel->reloptkind == RELOPT_JOINREL ||
+ 		   rel->reloptkind == RELOPT_OTHER_JOINREL);
+ 
+ 	/*
+ 	 * If any outer join can set the attribute value to NULL, the aggregate
+ 	 * would receive different input at the base rel level.
+ 	 *
+ 	 * TODO For RELOPT_JOINREL, do not return if all the joins that can set
+ 	 * any entry of the grouped target (do we need to postpone this check
+ 	 * until the grouped target is available, and should create_grouped_target
+ 	 * take care?) of this rel to NULL are provably below rel. (It's ok if rel
+ 	 * is one of these joins.)
+ 	 */
+ 	if (bms_overlap(rel->relids, root->nullable_baserels))
+ 		return;
+ 
+ 	/*
+ 	 * Check if some aggregates can be evaluated in this relation's target,
+ 	 * and collect all vars referenced by these aggregates.
+ 	 */
+ 	rel_aggregates = NIL;
+ 	found_higher = false;
+ 	foreach(lc, root->grouped_var_list)
+ 	{
+ 		GroupedVarInfo	*gvi = castNode(GroupedVarInfo, lfirst(lc));
+ 
+ 		/*
+ 		 * The subset includes gv_eval_at uninitialized, which typically means
+ 		 * Aggref.aggstar.
+ 		 */
+ 		if (bms_is_subset(gvi->gv_eval_at, rel->relids))
+ 		{
+ 			Aggref	*aggref = castNode(Aggref, gvi->gvexpr);
+ 
+ 			/*
+ 			 * Accept the aggregate.
+ 			 *
+ 			 * GroupedVarInfo is more convenient for the next processing than
+ 			 * Aggref, see add_aggregates_to_grouped_target.
+ 			 */
+ 			rel_aggregates = lappend(rel_aggregates, gvi);
+ 
+ 			if (rel->relid > 0)
+ 			{
+ 				/*
+ 				 * Simple relation. Collect attributes referenced by the
+ 				 * aggregate arguments.
+ 				 */
+ 				pull_varattnos((Node *) aggref, rel->relid, &rel_agg_attrs);
+ 			}
+ 			else
+ 			{
+ 				List	*agg_vars;
+ 
+ 				/*
+ 				 * Join. Collect vars referenced by the aggregate
+ 				 * arguments.
+ 				 */
+ 				/*
+ 				 * TODO Can any argument contain PHVs? And if so, does it matter?
+ 				 * Consider PVC_INCLUDE_PLACEHOLDERS | PVC_RECURSE_PLACEHOLDERS.
+ 				 */
+ 				agg_vars = pull_var_clause((Node *) aggref,
+ 										   PVC_RECURSE_AGGREGATES);
+ 				rel_agg_vars = list_concat(rel_agg_vars, agg_vars);
+ 			}
+ 		}
+ 		else if (bms_overlap(gvi->gv_eval_at, rel->relids))
+ 		{
+ 			/*
+ 			 * Remember that there is at least one aggregate that needs more
+ 			 * than this rel.
+ 			 */
+ 			found_higher = true;
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Grouping makes little sense w/o aggregate function.
+ 	 */
+ 	if (rel_aggregates == NIL)
+ 	{
+ 		bms_free(rel_agg_attrs);
+ 		return;
+ 	}
+ 
+ 	if (found_higher)
+ 	{
+ 		/*
+ 		 * If some aggregate(s) need only this rel but some other need
+ 		 * multiple relations including the the current one, grouping of the
+ 		 * current rel could steal some input variables from the "higher
+ 		 * aggregate" (besides decreasing the number of input rows).
+ 		 */
+ 		list_free(rel_aggregates);
+ 		bms_free(rel_agg_attrs);
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * If rel->reltarget can be used for aggregation, mark the relation as
+ 	 * capable of grouping.
+ 	 */
+ 	Assert(rel->gpi == NULL);
+ 	target_grouped = create_grouped_target(root, rel, rel_agg_attrs,
+ 										   rel_agg_vars);
+ 	if (target_grouped != NULL)
+ 	{
+ 		GroupedPathInfo	*gpi;
+ 
+ 		gpi = makeNode(GroupedPathInfo);
+ 		gpi->target = copy_pathtarget(target_grouped);
+ 		gpi->pathlist = NIL;
+ 		gpi->partial_pathlist = NIL;
+ 		rel->gpi = gpi;
+ 
+ 		/*
+ 		 * Add aggregates (in the form of GroupedVar) to the target.
+ 		 */
+ 		add_aggregates_to_target(root, gpi->target, rel_aggregates, rel);
+ 	}
+ }
+ 
+ /*
   * get_joinrel_parampathinfo
   *		Get the ParamPathInfo for a parameterized path for a join relation,
   *		constructing one if we don't have one already.
*************** get_joinrel_parampathinfo(PlannerInfo *r
*** 1290,1301 ****
  	*restrict_clauses = list_concat(pclauses, *restrict_clauses);
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	foreach(lc, joinrel->ppilist)
! 	{
! 		ppi = (ParamPathInfo *) lfirst(lc);
! 		if (bms_equal(ppi->ppi_req_outer, required_outer))
! 			return ppi;
! 	}
  
  	/* Estimate the number of rows returned by the parameterized join */
  	rows = get_parameterized_joinrel_size(root, joinrel,
--- 1735,1742 ----
  	*restrict_clauses = list_concat(pclauses, *restrict_clauses);
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	if ((ppi = find_param_path_info(joinrel, required_outer)))
! 		return ppi;
  
  	/* Estimate the number of rows returned by the parameterized join */
  	rows = get_parameterized_joinrel_size(root, joinrel,
*************** ParamPathInfo *
*** 1334,1340 ****
  get_appendrel_parampathinfo(RelOptInfo *appendrel, Relids required_outer)
  {
  	ParamPathInfo *ppi;
- 	ListCell   *lc;
  
  	/* Unparameterized paths have no ParamPathInfo */
  	if (bms_is_empty(required_outer))
--- 1775,1780 ----
*************** get_appendrel_parampathinfo(RelOptInfo *
*** 1343,1354 ****
  	Assert(!bms_overlap(appendrel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	foreach(lc, appendrel->ppilist)
! 	{
! 		ppi = (ParamPathInfo *) lfirst(lc);
! 		if (bms_equal(ppi->ppi_req_outer, required_outer))
! 			return ppi;
! 	}
  
  	/* Else build the ParamPathInfo */
  	ppi = makeNode(ParamPathInfo);
--- 1783,1790 ----
  	Assert(!bms_overlap(appendrel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	if ((ppi = find_param_path_info(appendrel, required_outer)))
! 		return ppi;
  
  	/* Else build the ParamPathInfo */
  	ppi = makeNode(ParamPathInfo);
*************** get_appendrel_parampathinfo(RelOptInfo *
*** 1359,1361 ****
--- 1795,1917 ----
  
  	return ppi;
  }
+ 
+ /*
+  * Returns a ParamPathInfo for outer relations specified by required_outer, if
+  * already available in the given rel. Returns NULL otherwise.
+  */
+ ParamPathInfo *
+ find_param_path_info(RelOptInfo *rel, Relids required_outer)
+ {
+ 	ListCell   *lc;
+ 
+ 	foreach(lc, rel->ppilist)
+ 	{
+ 		ParamPathInfo  *ppi = (ParamPathInfo *) lfirst(lc);
+ 		if (bms_equal(ppi->ppi_req_outer, required_outer))
+ 			return ppi;
+ 	}
+ 
+ 	return NULL;
+ }
+ 
+ /*
+  * build_joinrel_partition_info
+  *		If the join between given partitioned relations is possibly partitioned
+  *		set the partitioning scheme and partition keys expressions for the
+  *		join.
+  *
+  * If the two relations have same partitioning scheme, their join may be
+  * partitioned and will follow the same partitioning scheme as the joining
+  * relations.
+  */
+ static void
+ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
+ 							 RelOptInfo *inner_rel, List *restrictlist,
+ 							 JoinType jointype)
+ {
+ 	int		num_pks;
+ 	int		cnt;
+ 	bool	is_strict;
+ 
+ 	/* Nothing to do if partition-wise join technique is disabled. */
+ 	if (!enable_partition_wise_join)
+ 	{
+ 		joinrel->part_scheme = NULL;
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * The join is not partitioned, if any of the relations being joined are
+ 	 * not partitioned or they do not have same partitioning scheme or if there
+ 	 * is no equi-join between partition keys.
+ 	 *
+ 	 * For an N-way inner join, where every syntactic inner join has equi-join
+ 	 * between partition keys and a matching partitioning scheme, partition
+ 	 * keys of N relations form an equivalence class, thus inducing an
+ 	 * equi-join between any pair of joining relations.
+ 	 *
+ 	 * For an N-way join with outer joins, where every syntactic join has an
+ 	 * equi-join between partition keys and a matching partitioning scheme,
+ 	 * outer join reordering identities in optimizer/README imply that only
+ 	 * those pairs of join are legal which have an equi-join between partition
+ 	 * keys. Thus every pair of joining relations we see here should have an
+ 	 * equi-join if this join has been deemed as a partitioned join.
+ 	 */
+ 	if (!outer_rel->part_scheme || !inner_rel->part_scheme ||
+ 		outer_rel->part_scheme != inner_rel->part_scheme ||
+ 		!have_partkey_equi_join(outer_rel, inner_rel, jointype, restrictlist,
+ 								&is_strict))
+ 	{
+ 		joinrel->part_scheme = NULL;
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * This function will be called only once for each joinrel, hence it should
+ 	 * not have partition scheme, partition key expressions and array for
+ 	 * storing child relations set.
+ 	 */
+ 	Assert(!joinrel->part_scheme && !joinrel->partexprs &&
+ 		   !joinrel->part_rels);
+ 
+ 	/*
+ 	 * Join relation is partitioned using same partitioning scheme as the
+ 	 * joining relations.
+ 	 */
+ 	joinrel->part_scheme = outer_rel->part_scheme;
+ 	num_pks = joinrel->part_scheme->partnatts;
+ 
+ 	/*
+ 	 * Construct partition keys for the join.
+ 	 *
+ 	 * An INNER join between two partitioned relations is partition by key
+ 	 * expressions from both the relations. For tables A and B partitioned by a
+ 	 * and b respectively, (A INNER JOIN B ON A.a = B.b) is partitioned by both
+ 	 * A.a and B.b.
+ 	 *
+ 	 * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
+ 	 * B.b NULL. These rows may not fit the partitioning conditions imposed on
+ 	 * B.b. Hence, strictly speaking, the join is not partitioned by B.b.
+ 	 * Strictly speaking, partition keys of an OUTER join should include
+ 	 * partition key expressions from the OUTER side only. Consider a join like
+ 	 * (A LEFT JOIN B on (A.a = B.b) LEFT JOIN C ON B.b = C.c. If we do not
+ 	 * include B.b as partition key expression for (AB), it prohibits us from
+ 	 * using partition-wise join when joining (AB) with C as there is no
+ 	 * equi-join between partition keys of joining relations. If the equality
+ 	 * operator is strict, two NULL values are never equal and no two rows from
+ 	 * mis-matching partitions can join. Hence if the equality operator is
+ 	 * strict it's safe to include B.b as partition key expression for (AB),
+ 	 * even though rows in (AB) are not strictly partitioned by B.b.
+ 	 */
+ 	joinrel->partexprs = (List **) palloc0(sizeof(List *) * num_pks);
+ 	for (cnt = 0; cnt < num_pks; cnt++)
+ 	{
+ 		List *pkexpr = list_copy(outer_rel->partexprs[cnt]);
+ 
+ 		if (jointype == JOIN_INNER || is_strict)
+ 			pkexpr = list_concat(pkexpr,
+ 								 list_copy(inner_rel->partexprs[cnt]));
+ 		joinrel->partexprs[cnt] = pkexpr;
+ 	}
+ }
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
new file mode 100644
index 0952385..dd962b7
*** a/src/backend/optimizer/util/tlist.c
--- b/src/backend/optimizer/util/tlist.c
*************** get_sortgrouplist_exprs(List *sgClauses,
*** 408,413 ****
--- 408,487 ----
  	return result;
  }
  
+ /*
+  * get_sortgrouplist_clauses
+  *
+  *		Given a "grouped target" (i.e. target where each non-GroupedVar
+  *		element must have sortgroupref set), build a list of the referencing
+  *		SortGroupClauses, a list of the corresponding grouping expressions and
+  *		a list of aggregate expressions.
+  */
+ /* Refine the function name. */
+ void
+ get_grouping_expressions(PlannerInfo *root, PathTarget *target,
+ 						 List **grouping_clauses, List **grouping_exprs,
+ 						 List **agg_exprs)
+ {
+ 	ListCell   *l;
+ 	int		i = 0;
+ 
+ 	foreach(l, target->exprs)
+ 	{
+ 		Index	sortgroupref = 0;
+ 		SortGroupClause *cl;
+ 		Expr		*texpr;
+ 
+ 		texpr = (Expr *) lfirst(l);
+ 
+ 		/* The target should contain at least one grouping column. */
+ 		Assert(target->sortgrouprefs != NULL);
+ 
+ 		if (IsA(texpr, GroupedVar))
+ 		{
+ 			/*
+ 			 * texpr should represent the first aggregate in the targetlist.
+ 			 */
+ 			break;
+ 		}
+ 
+ 		/*
+ 		 * Find the clause by sortgroupref.
+ 		 */
+ 		sortgroupref = target->sortgrouprefs[i++];
+ 
+ 		/*
+ 		 * Besides aggregates, the target should contain no expressions w/o
+ 		 * sortgroupref. Plain relation being joined to grouped can have
+ 		 * sortgroupref equal to zero for expressions contained neither in
+ 		 * grouping expression nor in aggregate arguments, but if the target
+ 		 * contains such an expression, it shouldn't be used for aggregation
+ 		 * --- see can_aggregate field of GroupedPathInfo.
+ 		 */
+ 		Assert(sortgroupref > 0);
+ 
+ 		cl = get_sortgroupref_clause(sortgroupref, root->parse->groupClause);
+ 		*grouping_clauses = list_append_unique(*grouping_clauses, cl);
+ 
+ 		/*
+ 		 * Add only unique clauses because of joins (both sides of a join can
+ 		 * point at the same grouping clause). XXX Is it worth adding a bool
+ 		 * argument indicating that we're dealing with join right now?
+ 		 */
+ 		*grouping_exprs = list_append_unique(*grouping_exprs, texpr);
+ 	}
+ 
+ 	/* Now collect the aggregates. */
+ 	while (l != NULL)
+ 	{
+ 		GroupedVar	*gvar = castNode(GroupedVar, lfirst(l));
+ 
+ 		/* Currently, GroupedVarInfo can only represent aggregate. */
+ 		Assert(gvar->agg_partial != NULL);
+ 		*agg_exprs = lappend(*agg_exprs, gvar->agg_partial);
+ 		l = lnext(l);
+ 	}
+ }
+ 
  
  /*****************************************************************************
   *		Functions to extract data from a list of SortGroupClauses
*************** apply_pathtarget_labeling_to_tlist(List
*** 783,788 ****
--- 857,1081 ----
  }
  
  /*
+  * Replace each "grouped var" in the source targetlist with the original
+  * expression.
+  *
+  * TODO Think of more suitable name. Although "grouped var" may substitute for
+  * grouping expressions in the future, currently Aggref is the only outcome of
+  * the replacement. undo_grouped_var_substitutions?
+  */
+ List *
+ restore_grouping_expressions(PlannerInfo *root, List *src)
+ {
+ 	List	*result = NIL;
+ 	ListCell	*l;
+ 
+ 	foreach(l, src)
+ 	{
+ 		TargetEntry	*te, *te_new;
+ 		Aggref	*expr_new = NULL;
+ 
+ 		te = castNode(TargetEntry, lfirst(l));
+ 
+ 		if (IsA(te->expr, GroupedVar))
+ 		{
+ 			GroupedVar	*gvar;
+ 
+ 			gvar = castNode(GroupedVar, te->expr);
+ 			expr_new = gvar->agg_partial;
+ 		}
+ 
+ 		if (expr_new != NULL)
+ 		{
+ 			te_new = flatCopyTargetEntry(te);
+ 			te_new->expr = (Expr *) expr_new;
+ 		}
+ 		else
+ 			te_new = te;
+ 		result = lappend(result, te_new);
+ 	}
+ 
+ 	return result;
+ }
+ 
+ /*
+  * For each aggregate add GroupedVar to target if "vars" is true, or the
+  * Aggref (marked as partial) if "vars" is false.
+  *
+  * If caller passes the aggregates, he must do so in the form of
+  * GroupedVarInfos so that we don't have to look for gvid. If NULL is passed,
+  * the function retrieves the suitable aggregates itself.
+  *
+  * List of the aggregates added is returned. This is only useful if the
+  * function had to retrieve the aggregates itself (i.e. NIL was passed for
+  * aggregates) -- caller is expected to do extra checks in that case (and to
+  * also free the list).
+  */
+ List *
+ add_aggregates_to_target(PlannerInfo *root, PathTarget *target,
+ 						 List *aggregates, RelOptInfo *rel)
+ {
+ 	ListCell	*lc;
+ 	GroupedVarInfo	*gvi;
+ 
+ 	if (aggregates == NIL)
+ 	{
+ 		/* Caller should pass the aggregates for base relation. */
+ 		Assert(rel->reloptkind != RELOPT_BASEREL);
+ 
+ 		/* Collect all aggregates that this rel can evaluate. */
+ 		foreach(lc, root->grouped_var_list)
+ 		{
+ 			gvi = castNode(GroupedVarInfo, lfirst(lc));
+ 
+ 			/*
+ 			 * Overlap is not guarantee of correctness alone, but caller needs
+ 			 * to do additional checks, so we're optimistic here.
+ 			 *
+ 			 * If gv_eval_at is NULL, the underlying Aggref should have
+ 			 * aggstar set.
+ 			 */
+ 			if (bms_overlap(gvi->gv_eval_at, rel->relids) ||
+ 				gvi->gv_eval_at == NULL)
+ 				aggregates = lappend(aggregates, gvi);
+ 		}
+ 
+ 		if (aggregates == NIL)
+ 			return NIL;
+ 	}
+ 
+ 	/* Create the vars and add them to the target. */
+ 	foreach(lc, aggregates)
+ 	{
+ 		GroupedVar	*gvar;
+ 
+ 		gvi = castNode(GroupedVarInfo, lfirst(lc));
+ 		gvar = makeNode(GroupedVar);
+ 		gvar->gvid = gvi->gvid;
+ 		gvar->gvexpr = gvi->gvexpr;
+ 		gvar->agg_partial = gvi->agg_partial;
+ 		add_new_column_to_pathtarget(target, (Expr *) gvar);
+ 	}
+ 
+ 	return aggregates;
+ }
+ 
+ /*
+  * Return ressortgroupref of the target entry that is either equal to the
+  * expression or exists in the same equivalence class.
+  */
+ Index
+ get_expr_sortgroupref(PlannerInfo *root, Expr *expr)
+ {
+ 	ListCell	*lc;
+ 	Index		sortgroupref;
+ 
+ 	/*
+ 	 * First, check if the query group clause contains exactly this
+ 	 * expression.
+ 	 */
+ 	foreach(lc, root->processed_tlist)
+ 	{
+ 		TargetEntry		*te = castNode(TargetEntry, lfirst(lc));
+ 
+ 		if (equal(expr, te->expr) && te->ressortgroupref > 0)
+ 			return te->ressortgroupref;
+ 	}
+ 
+ 	/*
+ 	 * If exactly this expression is not there, check if a grouping clause
+ 	 * exists that belongs to the same equivalence class as the expression.
+ 	 */
+ 	foreach(lc, root->group_pathkeys)
+ 	{
+ 		PathKey	*pk = castNode(PathKey, lfirst(lc));
+ 		EquivalenceClass		*ec = pk->pk_eclass;
+ 		ListCell		*lm;
+ 		EquivalenceMember		*em;
+ 		Expr	*em_expr = NULL;
+ 		Query	*query = root->parse;
+ 
+ 		/*
+ 		 * Single-member EC cannot provide us with additional expression.
+ 		 */
+ 		if (list_length(ec->ec_members) < 2)
+ 			continue;
+ 
+ 		/* We need equality anywhere in the join tree. */
+ 		if (ec->ec_below_outer_join)
+ 			continue;
+ 
+ 		/*
+ 		 * TODO Reconsider this restriction. As the grouping expression is
+ 		 * only evaluated at the relation level (and only the result will be
+ 		 * propagated to the final targetlist), volatile function might be
+ 		 * o.k. Need to think what volatile EC exactly means.
+ 		 */
+ 		if (ec->ec_has_volatile)
+ 			continue;
+ 
+ 		foreach(lm, ec->ec_members)
+ 		{
+ 			em = (EquivalenceMember *) lfirst(lm);
+ 
+ 			/* The EC has !ec_below_outer_join. */
+ 			Assert(!em->em_nullable_relids);
+ 			if (equal(em->em_expr, expr))
+ 			{
+ 				em_expr = (Expr *) em->em_expr;
+ 				break;
+ 			}
+ 		}
+ 
+ 		if (em_expr == NULL)
+ 			/* Go for the next EC. */
+ 			continue;
+ 
+ 		/*
+ 		 * Find the corresponding SortGroupClause, which provides us with
+ 		 * sortgroupref. (It can belong to any EC member.)
+ 		 */
+ 		sortgroupref = 0;
+ 		foreach(lm, ec->ec_members)
+ 		{
+ 			ListCell	*lsg;
+ 
+ 			em = (EquivalenceMember *) lfirst(lm);
+ 			foreach(lsg, query->groupClause)
+ 			{
+ 				SortGroupClause	*sgc;
+ 				Expr	*expr;
+ 
+ 				sgc = (SortGroupClause *) lfirst(lsg);
+ 				expr = (Expr *) get_sortgroupclause_expr(sgc,
+ 														 query->targetList);
+ 				if (equal(em->em_expr, expr))
+ 				{
+ 					Assert(sgc->tleSortGroupRef > 0);
+ 					sortgroupref = sgc->tleSortGroupRef;
+ 					break;
+ 				}
+ 			}
+ 
+ 			if (sortgroupref > 0)
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * Since we searched in group_pathkeys, at least one EM of this EC
+ 		 * should correspond to a SortGroupClause, otherwise the EC could
+ 		 * not exist at all.
+ 		 */
+ 		Assert(sortgroupref > 0);
+ 
+ 		return sortgroupref;
+ 	}
+ 
+ 	/* No EC found in group_pathkeys. */
+ 	return 0;
+ }
+ 
+ /*
   * split_pathtarget_at_srfs
   *		Split given PathTarget into multiple levels to position SRFs safely
   *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
new file mode 100644
index 184e5da..5e3c3b4
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*************** get_rule_expr(Node *node, deparse_contex
*** 7559,7564 ****
--- 7559,7572 ----
  			get_agg_expr((Aggref *) node, context, (Aggref *) node);
  			break;
  
+ 		case T_GroupedVar:
+ 		{
+ 			GroupedVar *gvar = castNode(GroupedVar, node);
+ 
+ 			get_agg_expr(gvar->agg_partial, context, (Aggref *) gvar->gvexpr);
+ 			break;
+ 		}
+ 
  		case T_GroupingFunc:
  			{
  				GroupingFunc *gexpr = (GroupingFunc *) node;
*************** get_agg_combine_expr(Node *node, deparse
*** 8993,9002 ****
  	Aggref	   *aggref;
  	Aggref	   *original_aggref = private;
  
! 	if (!IsA(node, Aggref))
  		elog(ERROR, "combining Aggref does not point to an Aggref");
  
- 	aggref = (Aggref *) node;
  	get_agg_expr(aggref, context, original_aggref);
  }
  
--- 9001,9018 ----
  	Aggref	   *aggref;
  	Aggref	   *original_aggref = private;
  
! 	if (IsA(node, Aggref))
! 		aggref = (Aggref *) node;
! 	else if (IsA(node, GroupedVar))
! 	{
! 		GroupedVar *gvar = castNode(GroupedVar, node);
! 
! 		aggref = gvar->agg_partial;
! 		original_aggref = castNode(Aggref, gvar->gvexpr);
! 	}
! 	else
  		elog(ERROR, "combining Aggref does not point to an Aggref");
  
  	get_agg_expr(aggref, context, original_aggref);
  }
  
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
new file mode 100644
index a35b93b..78e24ea
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
***************
*** 114,119 ****
--- 114,120 ----
  #include "catalog/pg_statistic_ext.h"
  #include "catalog/pg_type.h"
  #include "executor/executor.h"
+ #include "executor/nodeAgg.h"
  #include "mb/pg_wchar.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
*************** estimate_hash_bucketsize(PlannerInfo *ro
*** 3705,3710 ****
--- 3706,3744 ----
  	return (Selectivity) estfract;
  }
  
+ /*
+  * estimate_hashagg_tablesize
+  *	  estimate the number of bytes that a hash aggregate hashtable will
+  *	  require based on the agg_costs, path width and dNumGroups.
+  *
+  * XXX this may be over-estimating the size now that hashagg knows to omit
+  * unneeded columns from the hashtable. Also for mixed-mode grouping sets,
+  * grouping columns not in the hashed set are counted here even though hashagg
+  * won't store them. Is this a problem?
+  */
+ Size
+ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
+ 						   double dNumGroups)
+ {
+ 	Size		hashentrysize;
+ 
+ 	/* Estimate per-hash-entry space at tuple width... */
+ 	hashentrysize = MAXALIGN(path->pathtarget->width) +
+ 		MAXALIGN(SizeofMinimalTupleHeader);
+ 
+ 	/* plus space for pass-by-ref transition values... */
+ 	hashentrysize += agg_costs->transitionSpace;
+ 	/* plus the per-hash-entry overhead */
+ 	hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+ 
+ 	/*
+ 	 * Note that this disregards the effect of fill-factor and growth policy
+ 	 * of the hash-table. That's probably ok, given default the default
+ 	 * fill-factor is relatively high. It'd be hard to meaningfully factor in
+ 	 * "double-in-size" growth policies here.
+ 	 */
+ 	return hashentrysize * dNumGroups;
+ }
  
  /*-------------------------------------------------------------------------
   *
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
new file mode 100644
index 85c6b61..cf94ccc
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** equalPartitionDescs(PartitionKey key, Pa
*** 1204,1210 ****
  			if (partdesc2->boundinfo == NULL)
  				return false;
  
! 			if (!partition_bounds_equal(key, partdesc1->boundinfo,
  										partdesc2->boundinfo))
  				return false;
  		}
--- 1204,1212 ----
  			if (partdesc2->boundinfo == NULL)
  				return false;
  
! 			if (!partition_bounds_equal(key->partnatts, key->parttyplen,
! 										key->parttypbyval,
! 										partdesc1->boundinfo,
  										partdesc2->boundinfo))
  				return false;
  		}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index a414fb2..343986d
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static struct config_bool ConfigureNames
*** 914,919 ****
--- 914,928 ----
  		true,
  		NULL, NULL, NULL
  	},
+ 	{
+ 		{"enable_partition_wise_join", PGC_USERSET, QUERY_TUNING_METHOD,
+ 			gettext_noop("Enables partition-wise join."),
+ 			NULL
+ 		},
+ 		&enable_partition_wise_join,
+ 		false,
+ 		NULL, NULL, NULL
+ 	},
  
  	{
  		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
new file mode 100644
index 421644c..e51bca1
*** a/src/include/catalog/partition.h
--- b/src/include/catalog/partition.h
*************** typedef struct PartitionDispatchData
*** 71,78 ****
  typedef struct PartitionDispatchData *PartitionDispatch;
  
  extern void RelationBuildPartitionDesc(Relation relation);
! extern bool partition_bounds_equal(PartitionKey key,
! 					   PartitionBoundInfo p1, PartitionBoundInfo p2);
  
  extern void check_new_partition_bound(char *relname, Relation parent, Node *bound);
  extern Oid	get_partition_parent(Oid relid);
--- 71,79 ----
  typedef struct PartitionDispatchData *PartitionDispatch;
  
  extern void RelationBuildPartitionDesc(Relation relation);
! extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
! 					   bool *parttypbyval, PartitionBoundInfo b1,
! 					   PartitionBoundInfo b2);
  
  extern void check_new_partition_bound(char *relname, Relation parent, Node *bound);
  extern Oid	get_partition_parent(Oid relid);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
new file mode 100644
index 6ca44f7..c57ff7b
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
*************** typedef void (*ShutdownForeignScan_funct
*** 155,160 ****
--- 155,163 ----
  typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
  															 RelOptInfo *rel,
  														 RangeTblEntry *rte);
+ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
+ 															List *fdw_private,
+ 													   RelOptInfo *child_rel);
  
  /*
   * FdwRoutine is the struct returned by a foreign-data wrapper's handler
*************** typedef struct FdwRoutine
*** 226,231 ****
--- 229,237 ----
  	InitializeDSMForeignScan_function InitializeDSMForeignScan;
  	InitializeWorkerForeignScan_function InitializeWorkerForeignScan;
  	ShutdownForeignScan_function ShutdownForeignScan;
+ 
+ 	/* Support functions for path reparameterization. */
+ 	ReparameterizeForeignPathByChild_function	ReparameterizeForeignPathByChild;
  } FdwRoutine;
  
  
diff --git a/src/include/nodes/extensible.h b/src/include/nodes/extensible.h
new file mode 100644
index 0b02cc1..1c802ad
*** a/src/include/nodes/extensible.h
--- b/src/include/nodes/extensible.h
*************** typedef struct CustomPathMethods
*** 96,101 ****
--- 96,104 ----
  												List *tlist,
  												List *clauses,
  												List *custom_plans);
+ 	struct List *(*ReparameterizeCustomPathByChild) (PlannerInfo *root,
+ 													 List *custom_private,
+ 													 RelOptInfo *child_rel);
  }	CustomPathMethods;
  
  /*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
new file mode 100644
index f59d719..ba1eac8
*** a/src/include/nodes/nodes.h
--- b/src/include/nodes/nodes.h
*************** typedef enum NodeTag
*** 218,223 ****
--- 218,224 ----
  	T_IndexOptInfo,
  	T_ForeignKeyOptInfo,
  	T_ParamPathInfo,
+ 	T_GroupedPathInfo,
  	T_Path,
  	T_IndexPath,
  	T_BitmapHeapPath,
*************** typedef enum NodeTag
*** 258,267 ****
--- 259,270 ----
  	T_PathTarget,
  	T_RestrictInfo,
  	T_PlaceHolderVar,
+ 	T_GroupedVar,
  	T_SpecialJoinInfo,
  	T_AppendRelInfo,
  	T_PartitionedChildRelInfo,
  	T_PlaceHolderInfo,
+ 	T_GroupedVarInfo,
  	T_MinMaxAggInfo,
  	T_PlannerParamItem,
  	T_RollupData,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
new file mode 100644
index 7a8e2fd..b576dd5
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 15,20 ****
--- 15,21 ----
  #define RELATION_H
  
  #include "access/sdir.h"
+ #include "catalog/partition.h"
  #include "lib/stringinfo.h"
  #include "nodes/params.h"
  #include "nodes/parsenodes.h"
*************** typedef struct PlannerInfo
*** 256,261 ****
--- 257,264 ----
  
  	List	   *placeholder_list;		/* list of PlaceHolderInfos */
  
+ 	List		*grouped_var_list; /* List of GroupedVarInfos. */
+ 
  	List	   *fkey_list;		/* list of ForeignKeyOptInfos */
  
  	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
*************** typedef struct PlannerInfo
*** 265,270 ****
--- 268,276 ----
  	List	   *distinct_pathkeys;		/* distinctClause pathkeys, if any */
  	List	   *sort_pathkeys;	/* sortClause pathkeys, if any */
  
+ 	List	   *part_schemes;	/* Canonicalised partition schemes
+ 								 * used in the query. */
+ 
  	List	   *initial_rels;	/* RelOptInfos we are now trying to join */
  
  	/* Use fetch_upper_rel() to get any particular upper rel */
*************** typedef struct PlannerInfo
*** 325,330 ****
--- 331,362 ----
  	((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \
  	 rt_fetch(rti, (root)->parse->rtable))
  
+ /*
+  * Partitioning scheme
+  *		Structure to hold partitioning scheme for a given relation.
+  *
+  * Multiple relations may be partitioned in the same way. The relations
+  * resulting from joining such relations may be partitioned in the same way as
+  * the joining relations. Similarly, relations derived from such relations by
+  * grouping, sorting may be partitioned in the same way as the underlying
+  * scan relations. All such relations partitioned in the same way share the
+  * partitioning scheme.
+  *
+  * PlannerInfo stores a list of distinct "canonical" partitioning schemes.
+  * RelOptInfo of a partitioned relation holds the pointer to "canonical"
+  * partitioning scheme.
+  */
+ typedef struct PartitionSchemeData
+ {
+ 	char		strategy;		/* partition strategy */
+ 	int16		partnatts;		/* number of partition attributes */
+ 	Oid		   *partopfamily;	/* OIDs of operator families */
+ 	Oid		   *partopcintype;	/* OIDs of opclass declared input data types */
+ 	FmgrInfo   *partsupfunc;	/* lookup info for support funcs */
+ 	Oid		   *parttypcoll;	/* OIDs of collations of partition keys. */
+ } PartitionSchemeData;
+ 
+ typedef struct PartitionSchemeData *PartitionScheme;
  
  /*----------
   * RelOptInfo
*************** typedef struct PlannerInfo
*** 359,364 ****
--- 391,401 ----
   * handling join alias Vars.  Currently this is not needed because all join
   * alias Vars are expanded to non-aliased form during preprocess_expression.
   *
+  * We also have relations representing joins between child relations of
+  * different partitioned tables. These relations are not added to
+  * join_rel_level lists as they are not joined directly by the dynamic
+  * programming algorithm.
+  *
   * There is also a RelOptKind for "upper" relations, which are RelOptInfos
   * that describe post-scan/join processing steps, such as aggregation.
   * Many of the fields in these RelOptInfos are meaningless, but their Path
*************** typedef struct PlannerInfo
*** 401,406 ****
--- 438,445 ----
   *		direct_lateral_relids - rels this rel has direct LATERAL references to
   *		lateral_relids - required outer rels for LATERAL, as a Relids set
   *			(includes both direct and indirect lateral references)
+  *		gpi - GroupedPathInfo if the relation can produce grouped paths, NULL
+  *		otherwise.
   *
   * If the relation is a base relation it will have these fields set:
   *
*************** typedef struct PlannerInfo
*** 486,491 ****
--- 525,543 ----
   * We store baserestrictcost in the RelOptInfo (for base relations) because
   * we know we will need it at least once (to price the sequential scan)
   * and may need it multiple times to price index scans.
+  *
+  * If the relation is partitioned these fields will be set
+  * 		part_scheme - Partitioning scheme of the relation
+  * 		nparts	- Number of partitions
+  * 		boundinfo	- Partition bounds/lists
+  * 		part_rels	- RelOptInfos of the partition relations
+  * 		partexprs	- Partition key expressions
+  *
+  * Note: A base relation will always have only one set of partition keys. But a
+  * join relation is partitioned by the partition keys of joining relations.
+  * Partition keys are stored as an array of partition key expressions, with
+  * each array element containing a list of one (for a base relation) or more
+  * (as many as the number of joining relations) expressions.
   *----------
   */
  typedef enum RelOptKind
*************** typedef enum RelOptKind
*** 493,498 ****
--- 545,551 ----
  	RELOPT_BASEREL,
  	RELOPT_JOINREL,
  	RELOPT_OTHER_MEMBER_REL,
+ 	RELOPT_OTHER_JOINREL,
  	RELOPT_UPPER_REL,
  	RELOPT_DEADREL
  } RelOptKind;
*************** typedef enum RelOptKind
*** 506,518 ****
  	 (rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
  
  /* Is the given relation a join relation? */
! #define IS_JOIN_REL(rel) ((rel)->reloptkind == RELOPT_JOINREL)
  
  /* Is the given relation an upper relation? */
  #define IS_UPPER_REL(rel) ((rel)->reloptkind == RELOPT_UPPER_REL)
  
  /* Is the given relation an "other" relation? */
! #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
  
  typedef struct RelOptInfo
  {
--- 559,575 ----
  	 (rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
  
  /* Is the given relation a join relation? */
! #define IS_JOIN_REL(rel)	\
! 	((rel)->reloptkind == RELOPT_JOINREL || \
! 	 (rel)->reloptkind == RELOPT_OTHER_JOINREL)
  
  /* Is the given relation an upper relation? */
  #define IS_UPPER_REL(rel) ((rel)->reloptkind == RELOPT_UPPER_REL)
  
  /* Is the given relation an "other" relation? */
! #define IS_OTHER_REL(rel) \
! 	((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL || \
! 	 (rel)->reloptkind == RELOPT_OTHER_JOINREL)
  
  typedef struct RelOptInfo
  {
*************** typedef struct RelOptInfo
*** 548,553 ****
--- 605,613 ----
  	Relids		direct_lateral_relids;	/* rels directly laterally referenced */
  	Relids		lateral_relids; /* minimum parameterization of rel */
  
+ 	/* Information needed to produce grouped paths. */
+ 	struct GroupedPathInfo	*gpi;
+ 
  	/* information about a base rel (not set for join rels!) */
  	Index		relid;
  	Oid			reltablespace;	/* containing tablespace */
*************** typedef struct RelOptInfo
*** 566,571 ****
--- 626,632 ----
  	PlannerInfo *subroot;		/* if subquery */
  	List	   *subplan_params; /* if subquery */
  	int			rel_parallel_workers;	/* wanted number of parallel workers */
+ 	Oid		   *part_oids;		/* OIDs of partitions */
  
  	/* Information about foreign tables and foreign joins */
  	Oid			serverid;		/* identifies server for the table or join */
*************** typedef struct RelOptInfo
*** 591,596 ****
--- 652,673 ----
  
  	/* used by "other" relations */
  	Relids		top_parent_relids;		/* Relids of topmost parents */
+ 
+ 	/* For all the partitioned relations. */
+ 	PartitionScheme part_scheme;	/* Partitioning scheme. */
+ 	int			nparts;			/* number of partitions */
+ 	PartitionBoundInfo boundinfo;	/* Partition bounds/lists */
+ 	struct RelOptInfo **part_rels;		/* Array of RelOptInfos of partitions,
+ 										 * stored in the same order as bounds
+ 										 * or lists in PartitionScheme.
+ 										 */
+ 	List	  **partexprs;				/* Array of list of partition key
+ 										 * expressions. For base relations
+ 										 * these are one element lists. For
+ 										 * join there may be as many elements
+ 										 * as the number of joining
+ 										 * relations.
+ 										 */
  } RelOptInfo;
  
  /*
*************** typedef struct ParamPathInfo
*** 913,918 ****
--- 990,1017 ----
  	List	   *ppi_clauses;	/* join clauses available from outer rels */
  } ParamPathInfo;
  
+ /*
+  * GroupedPathInfo
+  *
+  * If RelOptInfo points to this structure, grouped paths can be created for
+  * it.
+  *
+  * "target" will be used as pathtarget of grouped paths produced by this
+  * relation. Grouped path is either a result of aggregation of the relation
+  * that owns this structure or, if the owning relation is a join, a join path
+  * whose one side is a grouped path and the other is a plain (i.e. not
+  * grouped) one. (Two grouped paths cannot be joined in general because
+  * grouping of one side of the join essentially reduces occurrence of groups
+  * of the other side in the input of the final aggregation.)
+  */
+ typedef struct GroupedPathInfo
+ {
+ 	NodeTag		type;
+ 
+ 	PathTarget	*target;		/* output of grouped paths. */
+ 	List	*pathlist;			/* List of grouped paths. */
+ 	List	*partial_pathlist;	/* List of partial grouped paths. */
+ } GroupedPathInfo;
  
  /*
   * Type "Path" is used as-is for sequential-scan paths, as well as some other
*************** typedef struct PlaceHolderVar
*** 1852,1857 ****
--- 1951,1989 ----
  	Index		phlevelsup;		/* > 0 if PHV belongs to outer query */
  } PlaceHolderVar;
  
+ 
+ /*
+  * Similar to the concept of PlaceHolderVar, we treat aggregates and grouping
+  * columns as special variables if grouping is possible below the top-level
+  * join. The reason is that aggregates having start as the argument can be
+  * evaluated at various places in the join tree (i.e. cannot be assigned to
+  * target list of exactly one relation). Also this concept seems to be less
+  * invasive than adding the grouped vars to reltarget (in which case
+  * attr_needed and attr_widths arrays of RelOptInfo) would also need
+  * additional changes.
+  *
+  * gvexpr is a pointer to gvexpr field of the corresponding instance
+  * GroupedVarInfo. It's there for the sake of exprType(), exprCollation(),
+  * etc.
+  *
+  * agg_partial also points to the corresponding field of GroupedVarInfo if the
+  * GroupedVar is in the target of a parent relation (RELOPT_BASEREL). However
+  * within a child relation's (RELOPT_OTHER_MEMBER_REL) target it points to a
+  * copy which has argument expressions translated, so they no longer reference
+  * the parent.
+  *
+  * XXX Currently we only create GroupedVar for aggregates, but sometime we can
+  * do it for grouping keys as well. That would allow grouping below the
+  * top-level join by keys other than plain Var.
+  */
+ typedef struct GroupedVar
+ {
+ 	Expr		xpr;
+ 	Expr		*gvexpr;		/* the represented expression */
+ 	Aggref		*agg_partial;	/* partial aggregate if gvexpr is aggregate */
+ 	Index		gvid;		/* GroupedVarInfo */
+ } GroupedVar;
+ 
  /*
   * "Special join" info.
   *
*************** typedef struct PlaceHolderInfo
*** 2067,2072 ****
--- 2199,2220 ----
  } PlaceHolderInfo;
  
  /*
+  * Likewise, GroupedVarInfo exists for each distinct GroupedVar.
+  */
+ typedef struct GroupedVarInfo
+ {
+ 	NodeTag		type;
+ 
+ 	Index		gvid;			/* GroupedVar.gvid */
+ 	Expr		*gvexpr;		/* the represented expression. */
+ 	Aggref		*agg_partial;	/* if gvexpr is aggregate, agg_partial is
+ 								 * the corresponding partial aggregate */
+ 	Relids		gv_eval_at;		/* lowest level we can evaluate the expression
+ 								 * at or NULL if it can happen anywhere. */
+ 	int32		gv_width;		/* estimated width of the expression */
+ } GroupedVarInfo;
+ 
+ /*
   * This struct describes one potentially index-optimizable MIN/MAX aggregate
   * function.  MinMaxAggPath contains a list of these, and if we accept that
   * path, the list is stored into root->minmax_aggs for use during setrefs.c.
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
new file mode 100644
index ed70def..ca06455
*** a/src/include/optimizer/cost.h
--- b/src/include/optimizer/cost.h
*************** extern bool enable_material;
*** 67,72 ****
--- 67,73 ----
  extern bool enable_mergejoin;
  extern bool enable_hashjoin;
  extern bool enable_gathermerge;
+ extern bool enable_partition_wise_join;
  extern int	constraint_exclusion;
  
  extern double clamp_row_est(double nrows);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
new file mode 100644
index 77bc770..4a0d845
*** a/src/include/optimizer/pathnode.h
--- b/src/include/optimizer/pathnode.h
*************** extern int compare_path_costs(Path *path
*** 25,37 ****
  extern int compare_fractional_path_costs(Path *path1, Path *path2,
  							  double fraction);
  extern void set_cheapest(RelOptInfo *parent_rel);
! extern void add_path(RelOptInfo *parent_rel, Path *new_path);
  extern bool add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 				  List *pathkeys, Relids required_outer);
! extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
  extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
! 						  Cost total_cost, List *pathkeys);
  
  extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
  					Relids required_outer, int parallel_workers);
--- 25,39 ----
  extern int compare_fractional_path_costs(Path *path1, Path *path2,
  							  double fraction);
  extern void set_cheapest(RelOptInfo *parent_rel);
! extern void add_path(RelOptInfo *parent_rel, Path *new_path, bool grouped);
  extern bool add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 							  List *pathkeys, Relids required_outer, bool grouped);
! extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path,
! 							 bool grouped);
  extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
! 									  Cost total_cost, List *pathkeys,
! 									  bool grouped);
  
  extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
  					Relids required_outer, int parallel_workers);
*************** extern ForeignPath *create_foreignscan_p
*** 112,118 ****
  						Path *fdw_outerpath,
  						List *fdw_private);
  
! extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
  extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
  
  extern NestPath *create_nestloop_path(PlannerInfo *root,
--- 114,123 ----
  						Path *fdw_outerpath,
  						List *fdw_private);
  
! extern Relids calc_nestloop_required_outer(Relids outerrelids,
! 							 Relids outer_paramrels,
! 							 Relids innerrelids,
! 							 Relids inner_paramrels);
  extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
  
  extern NestPath *create_nestloop_path(PlannerInfo *root,
*************** extern NestPath *create_nestloop_path(Pl
*** 124,130 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer);
  
  extern MergePath *create_mergejoin_path(PlannerInfo *root,
  					  RelOptInfo *joinrel,
--- 129,136 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer,
! 					 PathTarget *target);
  
  extern MergePath *create_mergejoin_path(PlannerInfo *root,
  					  RelOptInfo *joinrel,
*************** extern MergePath *create_mergejoin_path(
*** 138,144 ****
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys);
  
  extern HashPath *create_hashjoin_path(PlannerInfo *root,
  					 RelOptInfo *joinrel,
--- 144,151 ----
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys,
! 					  PathTarget *target);
  
  extern HashPath *create_hashjoin_path(PlannerInfo *root,
  					 RelOptInfo *joinrel,
*************** extern HashPath *create_hashjoin_path(Pl
*** 149,155 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses);
  
  extern ProjectionPath *create_projection_path(PlannerInfo *root,
  					   RelOptInfo *rel,
--- 156,163 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses,
! 					 PathTarget *target);
  
  extern ProjectionPath *create_projection_path(PlannerInfo *root,
  					   RelOptInfo *rel,
*************** extern AggPath *create_agg_path(PlannerI
*** 190,195 ****
--- 198,217 ----
  				List *qual,
  				const AggClauseCosts *aggcosts,
  				double numGroups);
+ extern AggPath *create_partial_agg_sorted_path(PlannerInfo *root,
+ 											   Path *subpath,
+ 											   bool first_call,
+ 											   List **group_clauses,
+ 											   List **group_exprs,
+ 											   List **agg_exprs,
+ 											   double input_rows);
+ extern AggPath *create_partial_agg_hashed_path(PlannerInfo *root,
+ 											   Path *subpath,
+ 											   bool first_call,
+ 											   List **group_clauses,
+ 											   List **group_exprs,
+ 											   List **agg_exprs,
+ 											   double input_rows);
  extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
  						 RelOptInfo *rel,
  						 Path *subpath,
*************** extern LimitPath *create_limit_path(Plan
*** 248,253 ****
--- 270,277 ----
  extern Path *reparameterize_path(PlannerInfo *root, Path *path,
  					Relids required_outer,
  					double loop_count);
+ extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
+ 					RelOptInfo *child_rel);
  
  /*
   * prototypes for relnode.c
*************** extern ParamPathInfo *get_joinrel_paramp
*** 285,289 ****
--- 309,320 ----
  						  List **restrict_clauses);
  extern ParamPathInfo *get_appendrel_parampathinfo(RelOptInfo *appendrel,
  							Relids required_outer);
+ extern ParamPathInfo *find_param_path_info(RelOptInfo *rel,
+ 									  Relids required_outer);
+ extern void prepare_rel_for_grouping(PlannerInfo *root, RelOptInfo *rel);
+ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
+ 					 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+ 					 RelOptInfo *parent_joinrel, List *restrictlist,
+ 					 SpecialJoinInfo *sjinfo, JoinType jointype);
  
  #endif   /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
new file mode 100644
index 25fe78c..8dd4efd
*** a/src/include/optimizer/paths.h
--- b/src/include/optimizer/paths.h
*************** extern void set_dummy_rel_pathlist(RelOp
*** 53,63 ****
  extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
  					 List *initial_rels);
  
! extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
  extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
  						double index_pages);
  extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
  										Path *bitmapqual);
  
  #ifdef OPTIMIZER_DEBUG
  extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
--- 53,69 ----
  extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
  					 List *initial_rels);
  
! extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
! 								  bool grouped);
! extern void create_grouped_path(PlannerInfo *root, RelOptInfo *rel,
! 								Path *subpath, bool precheck, bool partial,
! 								AggStrategy aggstrategy);
  extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
  						double index_pages);
  extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
  										Path *bitmapqual);
+ extern void generate_partition_wise_join_paths(PlannerInfo *root,
+ 											   RelOptInfo *rel);
  
  #ifdef OPTIMIZER_DEBUG
  extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
*************** extern void debug_print_rel(PlannerInfo
*** 67,73 ****
   * indxpath.c
   *	  routines to generate index paths
   */
! extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
  extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  							  List *restrictlist,
  							  List *exprlist, List *oprlist);
--- 73,80 ----
   * indxpath.c
   *	  routines to generate index paths
   */
! extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel,
! 							   bool grouped);
  extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  							  List *restrictlist,
  							  List *exprlist, List *oprlist);
*************** extern bool have_join_order_restriction(
*** 111,116 ****
--- 118,126 ----
  							RelOptInfo *rel1, RelOptInfo *rel2);
  extern bool have_dangerous_phv(PlannerInfo *root,
  				   Relids outer_relids, Relids inner_params);
+ extern void mark_dummy_rel(RelOptInfo *rel);
+ extern bool have_partkey_equi_join(RelOptInfo *rel1, RelOptInfo *rel2,
+ 					   JoinType jointype, List *restrictlist, bool *is_strict);
  
  /*
   * equivclass.c
diff --git a/src/include/optimizer/placeholder.h b/src/include/optimizer/placeholder.h
new file mode 100644
index 11e6403..8598268
*** a/src/include/optimizer/placeholder.h
--- b/src/include/optimizer/placeholder.h
*************** extern void fix_placeholder_input_needed
*** 28,32 ****
--- 28,34 ----
  extern void add_placeholders_to_base_rels(PlannerInfo *root);
  extern void add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
  							RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+ extern void add_placeholders_to_child_joinrel(PlannerInfo *root,
+ 							RelOptInfo *childrel, RelOptInfo *parentrel);
  
  #endif   /* PLACEHOLDER_H */
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
new file mode 100644
index 5df68a2..07bc4c0
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern int	join_collapse_limit;
*** 74,80 ****
  extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode);
  extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
  extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
! 					   Relids where_needed, bool create_new_ph);
  extern void find_lateral_references(PlannerInfo *root);
  extern void create_lateral_join_info(PlannerInfo *root);
  extern List *deconstruct_jointree(PlannerInfo *root);
--- 74,82 ----
  extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode);
  extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
  extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
! 								   Relids where_needed, bool create_new_ph);
! extern void add_grouping_info_to_base_rels(PlannerInfo *root);
! extern void add_grouped_vars_to_rels(PlannerInfo *root);
  extern void find_lateral_references(PlannerInfo *root);
  extern void create_lateral_join_info(PlannerInfo *root);
  extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
new file mode 100644
index f3aaa23..4a550bb
*** a/src/include/optimizer/planner.h
--- b/src/include/optimizer/planner.h
*************** extern Expr *preprocess_phv_expression(P
*** 58,62 ****
--- 58,64 ----
  extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
  
  extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti);
+ extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
+ 									RelOptInfo *joinrel);
  
  #endif   /* PLANNER_H */
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
new file mode 100644
index 2b20b36..95802c9
*** a/src/include/optimizer/prep.h
--- b/src/include/optimizer/prep.h
*************** extern RelOptInfo *plan_set_operations(P
*** 53,61 ****
  extern void expand_inherited_tables(PlannerInfo *root);
  
  extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
! 					   AppendRelInfo *appinfo);
  
  extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  RelOptInfo *child_rel);
  
  #endif   /* PREP_H */
--- 53,74 ----
  extern void expand_inherited_tables(PlannerInfo *root);
  
  extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
! 					   int nappinfos, AppendRelInfo **appinfos);
  
  extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  Relids child_relids,
! 								  Relids top_parent_relids);
! 
! extern Relids adjust_child_relids(Relids relids, int nappinfos,
! 					AppendRelInfo **appinfos);
! 
! extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
! 						Relids relids, int *nappinfos);
! 
! extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
! 									SpecialJoinInfo *parent_sjinfo,
! 									Relids left_relids, Relids right_relids);
! extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
! 							   Relids child_relids, Relids top_parent_relids);
  
  #endif   /* PREP_H */
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
new file mode 100644
index ccb93d8..ddea03c
*** a/src/include/optimizer/tlist.h
--- b/src/include/optimizer/tlist.h
*************** extern Node *get_sortgroupclause_expr(So
*** 41,46 ****
--- 41,49 ----
  						 List *targetList);
  extern List *get_sortgrouplist_exprs(List *sgClauses,
  						List *targetList);
+ extern void get_grouping_expressions(PlannerInfo *root, PathTarget *target,
+ 									 List **grouping_clauses,
+ 									 List **grouping_exprs, List **agg_exprs);
  
  extern SortGroupClause *get_sortgroupref_clause(Index sortref,
  						List *clauses);
*************** extern void split_pathtarget_at_srfs(Pla
*** 65,70 ****
--- 68,84 ----
  						 PathTarget *target, PathTarget *input_target,
  						 List **targets, List **targets_contain_srfs);
  
+ /* TODO Find the best location (position and in some cases even file) for the
+  * following ones. */
+ extern List *restore_grouping_expressions(PlannerInfo *root, List *src);
+ extern List *add_aggregates_to_target(PlannerInfo *root, PathTarget *target,
+ 									  List *aggregates, RelOptInfo *rel);
+ extern Index get_expr_sortgroupref(PlannerInfo *root, Expr *expr);
+ /* TODO Move definition from initsplan.c to tlist.c. */
+ extern PathTarget *create_grouped_target(PlannerInfo *root, RelOptInfo *rel,
+ 										 Relids rel_agg_attrs,
+ 										 List *rel_agg_vars);
+ 
  /* Convenience macro to get a PathTarget with valid cost/width fields */
  #define create_pathtarget(root, tlist) \
  	set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
new file mode 100644
index 9f9d2dc..e05e6f6
*** a/src/include/utils/selfuncs.h
--- b/src/include/utils/selfuncs.h
*************** extern double estimate_num_groups(Planne
*** 206,211 ****
--- 206,214 ----
  
  extern Selectivity estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey,
  						 double nbuckets);
+ extern Size estimate_hashagg_tablesize(Path *path,
+ 									   const AggClauseCosts *agg_costs,
+ 									   double dNumGroups);
  
  extern List *deconstruct_indexquals(IndexPath *path);
  extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
new file mode 100644
index 6163ed8..7a969f2
*** a/src/test/regress/expected/inherit.out
--- b/src/test/regress/expected/inherit.out
*************** select tableoid::regclass::text as relna
*** 625,630 ****
--- 625,652 ----
  (3 rows)
  
  drop table parted_tab;
+ -- Check UPDATE with *multi-level partitioned* inherited target
+ create table mlparted_tab (a int, b char, c text) partition by list (a);
+ create table mlparted_tab_part1 partition of mlparted_tab for values in (1);
+ create table mlparted_tab_part2 partition of mlparted_tab for values in (2) partition by list (b);
+ create table mlparted_tab_part3 partition of mlparted_tab for values in (3);
+ create table mlparted_tab_part2a partition of mlparted_tab_part2 for values in ('a');
+ create table mlparted_tab_part2b partition of mlparted_tab_part2 for values in ('b');
+ insert into mlparted_tab values (1, 'a'), (2, 'a'), (2, 'b'), (3, 'a');
+ update mlparted_tab mlp set c = 'xxx'
+ from
+   (select a from some_tab union all select a+1 from some_tab) ss (a)
+ where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
+ select tableoid::regclass::text as relname, mlparted_tab.* from mlparted_tab order by 1,2;
+        relname       | a | b |  c  
+ ---------------------+---+---+-----
+  mlparted_tab_part1  | 1 | a | 
+  mlparted_tab_part2a | 2 | a | 
+  mlparted_tab_part2b | 2 | b | xxx
+  mlparted_tab_part3  | 3 | a | xxx
+ (4 rows)
+ 
+ drop table mlparted_tab;
  drop table some_tab cascade;
  NOTICE:  drop cascades to table some_tab_child
  /* Test multiple inheritance of column defaults */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
new file mode 100644
index 568b783..cd1f7f3
*** a/src/test/regress/expected/sysviews.out
--- b/src/test/regress/expected/sysviews.out
*************** select count(*) >= 0 as ok from pg_prepa
*** 70,90 ****
  -- This is to record the prevailing planner enable_foo settings during
  -- a regression test run.
  select name, setting from pg_settings where name like 'enable%';
!          name         | setting 
! ----------------------+---------
!  enable_bitmapscan    | on
!  enable_gathermerge   | on
!  enable_hashagg       | on
!  enable_hashjoin      | on
!  enable_indexonlyscan | on
!  enable_indexscan     | on
!  enable_material      | on
!  enable_mergejoin     | on
!  enable_nestloop      | on
!  enable_seqscan       | on
!  enable_sort          | on
!  enable_tidscan       | on
! (12 rows)
  
  -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
  -- more-or-less working.  We can't test their contents in any great detail
--- 70,91 ----
  -- This is to record the prevailing planner enable_foo settings during
  -- a regression test run.
  select name, setting from pg_settings where name like 'enable%';
!             name            | setting 
! ----------------------------+---------
!  enable_bitmapscan          | on
!  enable_gathermerge         | on
!  enable_hashagg             | on
!  enable_hashjoin            | on
!  enable_indexonlyscan       | on
!  enable_indexscan           | on
!  enable_material            | on
!  enable_mergejoin           | on
!  enable_nestloop            | on
!  enable_partition_wise_join | off
!  enable_seqscan             | on
!  enable_sort                | on
!  enable_tidscan             | on
! (13 rows)
  
  -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
  -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
new file mode 100644
index 1f8f098..2d14885
*** a/src/test/regress/parallel_schedule
--- b/src/test/regress/parallel_schedule
*************** test: publication subscription
*** 103,109 ****
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass
  # ----------
  # Another group of parallel tests
  # NB: temp.sql does a reconnect which transiently uses 2 connections,
--- 103,109 ----
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass partition_join multi_level_partition_join
  # ----------
  # Another group of parallel tests
  # NB: temp.sql does a reconnect which transiently uses 2 connections,
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
new file mode 100644
index 04206c3..9ac24dd
*** a/src/test/regress/serial_schedule
--- b/src/test/regress/serial_schedule
*************** test: with
*** 179,181 ****
--- 179,183 ----
  test: xml
  test: event_trigger
  test: stats
+ test: partition_join
+ test: multi_level_partition_join
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
new file mode 100644
index d43b75c..b814a4c
*** a/src/test/regress/sql/inherit.sql
--- b/src/test/regress/sql/inherit.sql
*************** where parted_tab.a = ss.a;
*** 154,159 ****
--- 154,176 ----
  select tableoid::regclass::text as relname, parted_tab.* from parted_tab order by 1,2;
  
  drop table parted_tab;
+ 
+ -- Check UPDATE with *multi-level partitioned* inherited target
+ create table mlparted_tab (a int, b char, c text) partition by list (a);
+ create table mlparted_tab_part1 partition of mlparted_tab for values in (1);
+ create table mlparted_tab_part2 partition of mlparted_tab for values in (2) partition by list (b);
+ create table mlparted_tab_part3 partition of mlparted_tab for values in (3);
+ create table mlparted_tab_part2a partition of mlparted_tab_part2 for values in ('a');
+ create table mlparted_tab_part2b partition of mlparted_tab_part2 for values in ('b');
+ insert into mlparted_tab values (1, 'a'), (2, 'a'), (2, 'b'), (3, 'a');
+ 
+ update mlparted_tab mlp set c = 'xxx'
+ from
+   (select a from some_tab union all select a+1 from some_tab) ss (a)
+ where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
+ select tableoid::regclass::text as relname, mlparted_tab.* from mlparted_tab order by 1,2;
+ 
+ drop table mlparted_tab;
  drop table some_tab cascade;
  
  /* Test multiple inheritance of column defaults */

test_setup_partition_wise.sqltext/plainDownload

query_partition_wise.sqltext/plainDownload

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Antonin Houska (#6)

Re: Partition-wise aggregation/grouping

On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska <ah@cybertec.at> wrote:

Attached is a diff that contains both patches merged. This is just to prove my
assumption, details to be elaborated later. The scripts attached produce the
following plan in my environment:

QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2

Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce, except i have no idea what
"Parallel" would mean in a plan that contains no Gather node.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Antonin Houska

ah@cybertec.at

over 8 years ago

In reply to: Robert Haas (#7)

1 attachment(s)

Re: Partition-wise aggregation/grouping

Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska <ah@cybertec.at> wrote:

Attached is a diff that contains both patches merged. This is just to prove my
assumption, details to be elaborated later. The scripts attached produce the
following plan in my environment:

QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2

Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce,

The point is that the patch Jeevan wanted to work on is actually a subset of
[1]: /messages/by-id/9666.1491295317@localhost

except i have no idea what "Parallel" would mean in a plan that contains no
Gather node.

parallel_aware field was set mistakenly on the AggPath. Fixed patch is
attached below, producing this plan:

QUERY PLAN
------------------------------------------------
Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2

[1]: /messages/by-id/9666.1491295317@localhost

[2]: https://commitfest.postgresql.org/14/994/

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Attachments:

agg_pushdown_partition_wise_v2.difftext/x-diffDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
new file mode 100644
index d1bc5b0..0f782dc
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
*************** AND ftoptions @> array['fetch_size=60000
*** 7248,7250 ****
--- 7248,7370 ----
  (1 row)
  
  ROLLBACK;
+ -- ===================================================================
+ -- test partition-wise-joins
+ -- ===================================================================
+ SET enable_partition_wise_join=on;
+ CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
+ CREATE TABLE fprt1_p1 (LIKE fprt1);
+ CREATE TABLE fprt1_p2 (LIKE fprt1);
+ INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
+ INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
+ CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt1_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2');
+ ANALYZE fprt1;
+ ANALYZE fprt1_p1;
+ ANALYZE fprt1_p2;
+ CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
+ CREATE TABLE fprt2_p1 (LIKE fprt2);
+ CREATE TABLE fprt2_p2 (LIKE fprt2);
+ INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
+ INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
+ CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
+ ANALYZE fprt2;
+ ANALYZE fprt2_p1;
+ ANALYZE fprt2_p2;
+ -- inner join three tables
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+                                                      QUERY PLAN                                                     
+ --------------------------------------------------------------------------------------------------------------------
+  Sort
+    Sort Key: t1.a, t3.c
+    ->  Append
+          ->  Foreign Scan
+                Relations: ((public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)) INNER JOIN (public.ftprt1_p1 t3)
+          ->  Foreign Scan
+                Relations: ((public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)) INNER JOIN (public.ftprt1_p2 t3)
+ (7 rows)
+ 
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+   a  |  b  |  c   
+ -----+-----+------
+    0 |   0 | 0000
+  150 | 150 | 0003
+  250 | 250 | 0005
+  400 | 400 | 0008
+ (4 rows)
+ 
+ -- left outer join + nullable clasue
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+                                     QUERY PLAN                                     
+ -----------------------------------------------------------------------------------
+  Sort
+    Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c
+    ->  Append
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+ (5 rows)
+ 
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+  a | b |  c   
+ ---+---+------
+  0 | 0 | 0000
+  2 |   | 
+  4 |   | 
+  6 | 6 | 0000
+  8 |   | 
+ (5 rows)
+ 
+ -- with whole-row reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+                                    QUERY PLAN                                    
+ ---------------------------------------------------------------------------------
+  Sort
+    Sort Key: ((t1.*)::fprt1), ((t2.*)::fprt2)
+    ->  Append
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
+ (7 rows)
+ 
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+        t1       |       t2       
+ ----------------+----------------
+  (0,0,0000)     | (0,0,0000)
+  (150,150,0003) | (150,150,0003)
+  (250,250,0005) | (250,250,0005)
+  (400,400,0008) | (400,400,0008)
+ (4 rows)
+ 
+ -- join with lateral reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+                                    QUERY PLAN                                    
+ ---------------------------------------------------------------------------------
+  Sort
+    Sort Key: t1.a, t1.b
+    ->  Append
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)
+          ->  Foreign Scan
+                Relations: (public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)
+ (7 rows)
+ 
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+   a  |  b  
+ -----+-----
+    0 |   0
+  150 | 150
+  250 | 250
+  400 | 400
+ (4 rows)
+ 
+ RESET enable_partition_wise_join;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
new file mode 100644
index 509bb54..76a0551
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
*************** WHERE ftrelid = 'table30000'::regclass
*** 1717,1719 ****
--- 1717,1772 ----
  AND ftoptions @> array['fetch_size=60000'];
  
  ROLLBACK;
+ 
+ -- ===================================================================
+ -- test partition-wise-joins
+ -- ===================================================================
+ SET enable_partition_wise_join=on;
+ 
+ CREATE TABLE fprt1 (a int, b int, c varchar) PARTITION BY RANGE(a);
+ CREATE TABLE fprt1_p1 (LIKE fprt1);
+ CREATE TABLE fprt1_p2 (LIKE fprt1);
+ INSERT INTO fprt1_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 2) i;
+ INSERT INTO fprt1_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 2) i;
+ CREATE FOREIGN TABLE ftprt1_p1 PARTITION OF fprt1 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt1_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt1_p2 PARTITION OF fprt1 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (TABLE_NAME 'fprt1_p2');
+ ANALYZE fprt1;
+ ANALYZE fprt1_p1;
+ ANALYZE fprt1_p2;
+ 
+ CREATE TABLE fprt2 (a int, b int, c varchar) PARTITION BY RANGE(b);
+ CREATE TABLE fprt2_p1 (LIKE fprt2);
+ CREATE TABLE fprt2_p2 (LIKE fprt2);
+ INSERT INTO fprt2_p1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 249, 3) i;
+ INSERT INTO fprt2_p2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(250, 499, 3) i;
+ CREATE FOREIGN TABLE ftprt2_p1 PARTITION OF fprt2 FOR VALUES FROM (0) TO (250)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p1', use_remote_estimate 'true');
+ CREATE FOREIGN TABLE ftprt2_p2 PARTITION OF fprt2 FOR VALUES FROM (250) TO (500)
+ 	SERVER loopback OPTIONS (table_name 'fprt2_p2', use_remote_estimate 'true');
+ ANALYZE fprt2;
+ ANALYZE fprt2_p1;
+ ANALYZE fprt2_p2;
+ 
+ -- inner join three tables
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
+ 
+ -- left outer join + nullable clasue
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+ SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
+ 
+ -- with whole-row reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+ SELECT t1,t2 FROM fprt1 t1 JOIN fprt2 t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a % 25 =0 ORDER BY 1,2;
+ 
+ -- join with lateral reference
+ EXPLAIN (COSTS OFF)
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+ SELECT t1.a,t1.b FROM fprt1 t1, LATERAL (SELECT t2.a, t2.b FROM fprt2 t2 WHERE t1.a = t2.b AND t1.b = t2.a) q WHERE t1.a%25 = 0 ORDER BY 1,2;
+ 
+ RESET enable_partition_wise_join;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
new file mode 100644
index e02b0c8..c4d9228
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** ANY <replaceable class="parameter">num_s
*** 3643,3648 ****
--- 3643,3667 ----
        </listitem>
       </varlistentry>
  
+      <varlistentry id="guc-enable-partition-wise-join" xreflabel="enable_partition_wise_join">
+       <term><varname>enable_partition_wise_join</varname> (<type>boolean</type>)
+       <indexterm>
+        <primary><varname>enable_partition_wise_join</> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Enables or disables the query planner's use of partition-wise join
+         plans. When enabled, it spends time in creating paths for joins between
+         partitions and consumes memory to construct expression nodes to be used
+         for those joins, even if partition-wise join does not result in the
+         cheapest path. The time and memory increase exponentially with the
+         number of partitioned tables being joined and they increase linearly
+         with the number of partitions. The default is <literal>off</>.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
       <varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
        <term><varname>enable_seqscan</varname> (<type>boolean</type>)
        <indexterm>
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
new file mode 100644
index dbeaab5..ac8c2fa
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
*************** ShutdownForeignScan(ForeignScanState *no
*** 1270,1275 ****
--- 1270,1295 ----
     </para>
     </sect2>
  
+    <sect2 id="fdw-callbacks-reparameterize-paths">
+     <title>FDW Routines For reparameterization of paths</title>
+ 
+     <para>
+ <programlisting>
+ List *
+ ReparameterizeForeignPathByChild(PlannerInfo *root, List *fdw_private,
+                                  RelOptInfo *child_rel);
+ </programlisting>
+     This function is called while converting a path parameterized by the
+     top-most parent of the given child relation <literal>child_rel</> to be
+     parameterized by the child relation. The function is used to reparameterize
+     any paths or translate any expression nodes saved in the given
+     <literal>fdw_private</> member of a <structname>ForeignPath</>. The
+     callback may use <literal>reparameterize_path_by_child</>,
+     <literal>adjust_appendrel_attrs</> or
+     <literal>adjust_appendrel_attrs_multilevel</> as required.
+     </para>
+    </sect2>
+ 
     </sect1>
  
     <sect1 id="fdw-helpers">
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
new file mode 100644
index e0d2665..c44bb0e
*** a/src/backend/catalog/partition.c
--- b/src/backend/catalog/partition.c
*************** static List *generate_partition_qual(Rel
*** 126,140 ****
  
  static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
  					 List *datums, bool lower);
! static int32 partition_rbound_cmp(PartitionKey key,
! 					 Datum *datums1, RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2);
! static int32 partition_rbound_datum_cmp(PartitionKey key,
! 						   Datum *rb_datums, RangeDatumContent *rb_content,
! 						   Datum *tuple_datums);
  
! static int32 partition_bound_cmp(PartitionKey key,
! 					PartitionBoundInfo boundinfo,
  					int offset, void *probe, bool probe_is_bound);
  static int partition_bound_bsearch(PartitionKey key,
  						PartitionBoundInfo boundinfo,
--- 126,141 ----
  
  static PartitionRangeBound *make_one_range_bound(PartitionKey key, int index,
  					 List *datums, bool lower);
! static int32 partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc,
! 					 Oid *partcollation, Datum *datums1,
! 					 RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2);
! static int32 partition_rbound_datum_cmp(int partnatts, FmgrInfo *partsupfunc,
! 						   Oid *partcollation, Datum *rb_datums,
! 						   RangeDatumContent *rb_content, Datum *tuple_datums);
  
! static int32 partition_bound_cmp(int partnatts, FmgrInfo *partsupfunc,
! 					Oid *partcollation, PartitionBoundInfo boundinfo,
  					int offset, void *probe, bool probe_is_bound);
  static int partition_bound_bsearch(PartitionKey key,
  						PartitionBoundInfo boundinfo,
*************** RelationBuildPartitionDesc(Relation rel)
*** 592,598 ****
   * representation of partition bounds.
   */
  bool
! partition_bounds_equal(PartitionKey key,
  					   PartitionBoundInfo b1, PartitionBoundInfo b2)
  {
  	int			i;
--- 593,599 ----
   * representation of partition bounds.
   */
  bool
! partition_bounds_equal(int partnatts, int16 *parttyplen, bool *parttypbyval,
  					   PartitionBoundInfo b1, PartitionBoundInfo b2)
  {
  	int			i;
*************** partition_bounds_equal(PartitionKey key,
*** 613,619 ****
  	{
  		int			j;
  
! 		for (j = 0; j < key->partnatts; j++)
  		{
  			/* For range partitions, the bounds might not be finite. */
  			if (b1->content != NULL)
--- 614,620 ----
  	{
  		int			j;
  
! 		for (j = 0; j < partnatts; j++)
  		{
  			/* For range partitions, the bounds might not be finite. */
  			if (b1->content != NULL)
*************** partition_bounds_equal(PartitionKey key,
*** 642,649 ****
  			 * context.  datumIsEqual() should be simple enough to be safe.
  			 */
  			if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
! 							  key->parttypbyval[j],
! 							  key->parttyplen[j]))
  				return false;
  		}
  
--- 643,649 ----
  			 * context.  datumIsEqual() should be simple enough to be safe.
  			 */
  			if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
! 							  parttypbyval[j], parttyplen[j]))
  				return false;
  		}
  
*************** partition_bounds_equal(PartitionKey key,
*** 652,658 ****
  	}
  
  	/* There are ndatums+1 indexes in case of range partitions */
! 	if (key->strategy == PARTITION_STRATEGY_RANGE &&
  		b1->indexes[i] != b2->indexes[i])
  		return false;
  
--- 652,658 ----
  	}
  
  	/* There are ndatums+1 indexes in case of range partitions */
! 	if (b1->strategy == PARTITION_STRATEGY_RANGE &&
  		b1->indexes[i] != b2->indexes[i])
  		return false;
  
*************** check_new_partition_bound(char *relname,
*** 734,741 ****
  				 * First check if the resulting range would be empty with
  				 * specified lower and upper bounds
  				 */
! 				if (partition_rbound_cmp(key, lower->datums, lower->content, true,
! 										 upper) >= 0)
  					ereport(ERROR,
  							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
  					errmsg("cannot create range partition with empty range"),
--- 734,742 ----
  				 * First check if the resulting range would be empty with
  				 * specified lower and upper bounds
  				 */
! 				if (partition_rbound_cmp(key->partnatts, key->partsupfunc,
! 										 key->partcollation, lower->datums,
! 										 lower->content, true, upper) >= 0)
  					ereport(ERROR,
  							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
  					errmsg("cannot create range partition with empty range"),
*************** qsort_partition_rbound_cmp(const void *a
*** 1865,1871 ****
  	PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
  	PartitionKey key = (PartitionKey) arg;
  
! 	return partition_rbound_cmp(key, b1->datums, b1->content, b1->lower, b2);
  }
  
  /*
--- 1866,1874 ----
  	PartitionRangeBound *b2 = (*(PartitionRangeBound *const *) b);
  	PartitionKey key = (PartitionKey) arg;
  
! 	return partition_rbound_cmp(key->partnatts, key->partsupfunc,
! 								key->partcollation, b1->datums, b1->content,
! 								b1->lower, b2);
  }
  
  /*
*************** qsort_partition_rbound_cmp(const void *a
*** 1875,1881 ****
   * content1, and lower1) is <=, =, >= the bound specified in *b2
   */
  static int32
! partition_rbound_cmp(PartitionKey key,
  					 Datum *datums1, RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2)
  {
--- 1878,1884 ----
   * content1, and lower1) is <=, =, >= the bound specified in *b2
   */
  static int32
! partition_rbound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
  					 Datum *datums1, RangeDatumContent *content1, bool lower1,
  					 PartitionRangeBound *b2)
  {
*************** partition_rbound_cmp(PartitionKey key,
*** 1885,1891 ****
  	RangeDatumContent *content2 = b2->content;
  	bool		lower2 = b2->lower;
  
! 	for (i = 0; i < key->partnatts; i++)
  	{
  		/*
  		 * First, handle cases involving infinity, which don't require
--- 1888,1894 ----
  	RangeDatumContent *content2 = b2->content;
  	bool		lower2 = b2->lower;
  
! 	for (i = 0; i < partnatts; i++)
  	{
  		/*
  		 * First, handle cases involving infinity, which don't require
*************** partition_rbound_cmp(PartitionKey key,
*** 1905,1912 ****
  		else if (content2[i] != RANGE_DATUM_FINITE)
  			return content2[i] == RANGE_DATUM_NEG_INF ? 1 : -1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
! 												 key->partcollation[i],
  												 datums1[i],
  												 datums2[i]));
  		if (cmpval != 0)
--- 1908,1915 ----
  		else if (content2[i] != RANGE_DATUM_FINITE)
  			return content2[i] == RANGE_DATUM_NEG_INF ? 1 : -1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
! 												 partcollation[i],
  												 datums1[i],
  												 datums2[i]));
  		if (cmpval != 0)
*************** partition_rbound_cmp(PartitionKey key,
*** 1932,1951 ****
   * rb_lower) <=, =, >= partition key of tuple (tuple_datums)
   */
  static int32
! partition_rbound_datum_cmp(PartitionKey key,
! 						   Datum *rb_datums, RangeDatumContent *rb_content,
! 						   Datum *tuple_datums)
  {
  	int			i;
  	int32		cmpval = -1;
  
! 	for (i = 0; i < key->partnatts; i++)
  	{
  		if (rb_content[i] != RANGE_DATUM_FINITE)
  			return rb_content[i] == RANGE_DATUM_NEG_INF ? -1 : 1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[i],
! 												 key->partcollation[i],
  												 rb_datums[i],
  												 tuple_datums[i]));
  		if (cmpval != 0)
--- 1935,1954 ----
   * rb_lower) <=, =, >= partition key of tuple (tuple_datums)
   */
  static int32
! partition_rbound_datum_cmp(int partnatts, FmgrInfo *partsupfunc,
! 						   Oid *partcollation, Datum *rb_datums,
! 						   RangeDatumContent *rb_content, Datum *tuple_datums)
  {
  	int			i;
  	int32		cmpval = -1;
  
! 	for (i = 0; i < partnatts; i++)
  	{
  		if (rb_content[i] != RANGE_DATUM_FINITE)
  			return rb_content[i] == RANGE_DATUM_NEG_INF ? -1 : 1;
  
! 		cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[i],
! 												 partcollation[i],
  												 rb_datums[i],
  												 tuple_datums[i]));
  		if (cmpval != 0)
*************** partition_rbound_datum_cmp(PartitionKey
*** 1962,1978 ****
   * specified in *probe.
   */
  static int32
! partition_bound_cmp(PartitionKey key, PartitionBoundInfo boundinfo,
! 					int offset, void *probe, bool probe_is_bound)
  {
  	Datum	   *bound_datums = boundinfo->datums[offset];
  	int32		cmpval = -1;
  
! 	switch (key->strategy)
  	{
  		case PARTITION_STRATEGY_LIST:
! 			cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[0],
! 													 key->partcollation[0],
  													 bound_datums[0],
  													 *(Datum *) probe));
  			break;
--- 1965,1982 ----
   * specified in *probe.
   */
  static int32
! partition_bound_cmp(int partnatts, FmgrInfo *partsupfunc, Oid *partcollation,
! 					PartitionBoundInfo boundinfo, int offset, void *probe,
! 					bool probe_is_bound)
  {
  	Datum	   *bound_datums = boundinfo->datums[offset];
  	int32		cmpval = -1;
  
! 	switch (boundinfo->strategy)
  	{
  		case PARTITION_STRATEGY_LIST:
! 			cmpval = DatumGetInt32(FunctionCall2Coll(&partsupfunc[0],
! 													 partcollation[0],
  													 bound_datums[0],
  													 *(Datum *) probe));
  			break;
*************** partition_bound_cmp(PartitionKey key, Pa
*** 1990,2001 ****
  					 */
  					bool		lower = boundinfo->indexes[offset] < 0;
  
! 					cmpval = partition_rbound_cmp(key,
! 												bound_datums, content, lower,
! 											  (PartitionRangeBound *) probe);
  				}
  				else
! 					cmpval = partition_rbound_datum_cmp(key,
  														bound_datums, content,
  														(Datum *) probe);
  				break;
--- 1994,2007 ----
  					 */
  					bool		lower = boundinfo->indexes[offset] < 0;
  
! 					cmpval = partition_rbound_cmp(partnatts, partsupfunc,
! 												  partcollation, bound_datums,
! 												  content, lower,
! 												(PartitionRangeBound *) probe);
  				}
  				else
! 					cmpval = partition_rbound_datum_cmp(partnatts, partsupfunc,
! 														partcollation,
  														bound_datums, content,
  														(Datum *) probe);
  				break;
*************** partition_bound_cmp(PartitionKey key, Pa
*** 2003,2009 ****
  
  		default:
  			elog(ERROR, "unexpected partition strategy: %d",
! 				 (int) key->strategy);
  	}
  
  	return cmpval;
--- 2009,2015 ----
  
  		default:
  			elog(ERROR, "unexpected partition strategy: %d",
! 				 (int) boundinfo->strategy);
  	}
  
  	return cmpval;
*************** partition_bound_bsearch(PartitionKey key
*** 2037,2043 ****
  		int32		cmpval;
  
  		mid = (lo + hi + 1) / 2;
! 		cmpval = partition_bound_cmp(key, boundinfo, mid, probe,
  									 probe_is_bound);
  		if (cmpval <= 0)
  		{
--- 2043,2050 ----
  		int32		cmpval;
  
  		mid = (lo + hi + 1) / 2;
! 		cmpval = partition_bound_cmp(key->partnatts, key->partsupfunc,
! 									 key->partcollation, boundinfo, mid, probe,
  									 probe_is_bound);
  		if (cmpval <= 0)
  		{
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
new file mode 100644
index 5a34a46..717763d
*** a/src/backend/executor/execExpr.c
--- b/src/backend/executor/execExpr.c
*************** ExecInitExprRec(Expr *node, PlanState *p
*** 723,728 ****
--- 723,755 ----
  				break;
  			}
  
+ 		case T_GroupedVar:
+ 			/*
+ 			 * GroupedVar is treated as an aggregate if it appears in the
+ 			 * targetlist of Agg node, but as a normal variable elsewhere.
+ 			 */
+ 			if (parent && (IsA(parent, AggState)))
+ 			{
+ 				GroupedVar *gvar = (GroupedVar *) node;
+ 
+ 				/*
+ 				 * Currently GroupedVar can only represent partial aggregate.
+ 				 */
+ 				Assert(gvar->agg_partial != NULL);
+ 
+ 				ExecInitExprRec((Expr *) gvar->agg_partial, parent, state,
+ 								resv, resnull);
+ 				break;
+ 			}
+ 			else
+ 			{
+ 				/*
+ 				 * set_plan_refs should have replaced GroupedVar in the
+ 				 * targetlist with an ordinary Var.
+ 				 */
+ 				elog(ERROR, "parent of GroupedVar is not Agg node");
+ 			}
+ 
  		case T_GroupingFunc:
  			{
  				GroupingFunc *grp_node = (GroupingFunc *) node;
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
new file mode 100644
index c2b8618..c4cb4c0
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** find_unaggregated_cols_walker(Node *node
*** 1829,1834 ****
--- 1829,1845 ----
  		/* do not descend into aggregate exprs */
  		return false;
  	}
+ 	if (IsA(node, GroupedVar))
+ 	{
+ 		GroupedVar	   *gvar = (GroupedVar *) node;
+ 
+ 		/*
+ 		 * GroupedVar is currently used only for partial aggregation, so treat
+ 		 * it like an Aggref above.
+ 		 */
+ 		Assert(gvar->agg_partial != NULL);
+ 		return false;
+ 	}
  	return expression_tree_walker(node, find_unaggregated_cols_walker,
  								  (void *) colnos);
  }
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
new file mode 100644
index 00a0fed..7d188ea
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*************** _copyPlaceHolderVar(const PlaceHolderVar
*** 2206,2211 ****
--- 2206,2226 ----
  }
  
  /*
+  * _copyGroupedVar
+  */
+ static GroupedVar *
+ _copyGroupedVar(const GroupedVar *from)
+ {
+ 	GroupedVar *newnode = makeNode(GroupedVar);
+ 
+ 	COPY_NODE_FIELD(gvexpr);
+ 	COPY_NODE_FIELD(agg_partial);
+ 	COPY_SCALAR_FIELD(gvid);
+ 
+ 	return newnode;
+ }
+ 
+ /*
   * _copySpecialJoinInfo
   */
  static SpecialJoinInfo *
*************** copyObjectImpl(const void *from)
*** 4984,4989 ****
--- 4999,5007 ----
  		case T_PlaceHolderVar:
  			retval = _copyPlaceHolderVar(from);
  			break;
+ 		case T_GroupedVar:
+ 			retval = _copyGroupedVar(from);
+ 			break;
  		case T_SpecialJoinInfo:
  			retval = _copySpecialJoinInfo(from);
  			break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
new file mode 100644
index 46573ae..f1dacd5
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*************** _equalPlaceHolderVar(const PlaceHolderVa
*** 874,879 ****
--- 874,887 ----
  }
  
  static bool
+ _equalGroupedVar(const GroupedVar *a, const GroupedVar *b)
+ {
+ 	COMPARE_SCALAR_FIELD(gvid);
+ 
+ 	return true;
+ }
+ 
+ static bool
  _equalSpecialJoinInfo(const SpecialJoinInfo *a, const SpecialJoinInfo *b)
  {
  	COMPARE_BITMAPSET_FIELD(min_lefthand);
*************** equal(const void *a, const void *b)
*** 3148,3153 ****
--- 3156,3164 ----
  		case T_PlaceHolderVar:
  			retval = _equalPlaceHolderVar(a, b);
  			break;
+ 		case T_GroupedVar:
+ 			retval = _equalGroupedVar(a, b);
+ 			break;
  		case T_SpecialJoinInfo:
  			retval = _equalSpecialJoinInfo(a, b);
  			break;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
new file mode 100644
index 3e8189c..5c00e55
*** a/src/backend/nodes/nodeFuncs.c
--- b/src/backend/nodes/nodeFuncs.c
*************** exprType(const Node *expr)
*** 259,264 ****
--- 259,267 ----
  		case T_PlaceHolderVar:
  			type = exprType((Node *) ((const PlaceHolderVar *) expr)->phexpr);
  			break;
+ 		case T_GroupedVar:
+ 			type = exprType((Node *) ((const GroupedVar *) expr)->agg_partial);
+ 			break;
  		default:
  			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
  			type = InvalidOid;	/* keep compiler quiet */
*************** exprCollation(const Node *expr)
*** 931,936 ****
--- 934,942 ----
  		case T_PlaceHolderVar:
  			coll = exprCollation((Node *) ((const PlaceHolderVar *) expr)->phexpr);
  			break;
+ 		case T_GroupedVar:
+ 			coll = exprCollation((Node *) ((const GroupedVar *) expr)->gvexpr);
+ 			break;
  		default:
  			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
  			coll = InvalidOid;	/* keep compiler quiet */
*************** expression_tree_walker(Node *node,
*** 2198,2203 ****
--- 2204,2211 ----
  			break;
  		case T_PlaceHolderVar:
  			return walker(((PlaceHolderVar *) node)->phexpr, context);
+ 		case T_GroupedVar:
+ 			return walker(((GroupedVar *) node)->gvexpr, context);
  		case T_InferenceElem:
  			return walker(((InferenceElem *) node)->expr, context);
  		case T_AppendRelInfo:
*************** expression_tree_mutator(Node *node,
*** 2989,2994 ****
--- 2997,3012 ----
  				return (Node *) newnode;
  			}
  			break;
+ 		case T_GroupedVar:
+ 			{
+ 				GroupedVar *gv = (GroupedVar *) node;
+ 				GroupedVar *newnode;
+ 
+ 				FLATCOPY(newnode, gv, GroupedVar);
+ 				MUTATE(newnode->gvexpr, gv->gvexpr, Expr *);
+ 				MUTATE(newnode->agg_partial, gv->agg_partial, Aggref *);
+ 				return (Node *) newnode;
+ 			}
  		case T_InferenceElem:
  			{
  				InferenceElem *inferenceelemdexpr = (InferenceElem *) node;
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
new file mode 100644
index 28cef85..4b6ee30
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outPlannerInfo(StringInfo str, const Pl
*** 2186,2191 ****
--- 2186,2192 ----
  	WRITE_NODE_FIELD(pcinfo_list);
  	WRITE_NODE_FIELD(rowMarks);
  	WRITE_NODE_FIELD(placeholder_list);
+ 	WRITE_NODE_FIELD(grouped_var_list);
  	WRITE_NODE_FIELD(fkey_list);
  	WRITE_NODE_FIELD(query_pathkeys);
  	WRITE_NODE_FIELD(group_pathkeys);
*************** _outParamPathInfo(StringInfo str, const
*** 2408,2413 ****
--- 2409,2424 ----
  }
  
  static void
+ _outGroupedPathInfo(StringInfo str, const GroupedPathInfo *node)
+ {
+ 	WRITE_NODE_TYPE("GROUPEDPATHINFO");
+ 
+ 	WRITE_NODE_FIELD(target);
+ 	WRITE_NODE_FIELD(pathlist);
+ 	WRITE_NODE_FIELD(partial_pathlist);
+ }
+ 
+ static void
  _outRestrictInfo(StringInfo str, const RestrictInfo *node)
  {
  	WRITE_NODE_TYPE("RESTRICTINFO");
*************** _outPlaceHolderVar(StringInfo str, const
*** 2451,2456 ****
--- 2462,2477 ----
  }
  
  static void
+ _outGroupedVar(StringInfo str, const GroupedVar *node)
+ {
+ 	WRITE_NODE_TYPE("GROUPEDVAR");
+ 
+ 	WRITE_NODE_FIELD(gvexpr);
+ 	WRITE_NODE_FIELD(agg_partial);
+ 	WRITE_UINT_FIELD(gvid);
+ }
+ 
+ static void
  _outSpecialJoinInfo(StringInfo str, const SpecialJoinInfo *node)
  {
  	WRITE_NODE_TYPE("SPECIALJOININFO");
*************** outNode(StringInfo str, const void *obj)
*** 3996,4007 ****
--- 4017,4034 ----
  			case T_ParamPathInfo:
  				_outParamPathInfo(str, obj);
  				break;
+ 			case T_GroupedPathInfo:
+ 				_outGroupedPathInfo(str, obj);
+ 				break;
  			case T_RestrictInfo:
  				_outRestrictInfo(str, obj);
  				break;
  			case T_PlaceHolderVar:
  				_outPlaceHolderVar(str, obj);
  				break;
+ 			case T_GroupedVar:
+ 				_outGroupedVar(str, obj);
+ 				break;
  			case T_SpecialJoinInfo:
  				_outSpecialJoinInfo(str, obj);
  				break;
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
new file mode 100644
index a883220..138f71c
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
*************** _readVar(void)
*** 522,527 ****
--- 522,542 ----
  }
  
  /*
+  * _readGroupedVar
+  */
+ static GroupedVar *
+ _readGroupedVar(void)
+ {
+ 	READ_LOCALS(GroupedVar);
+ 
+ 	READ_NODE_FIELD(gvexpr);
+ 	READ_NODE_FIELD(agg_partial);
+ 	READ_UINT_FIELD(gvid);
+ 
+ 	READ_DONE();
+ }
+ 
+ /*
   * _readConst
   */
  static Const *
*************** parseNodeString(void)
*** 2440,2445 ****
--- 2455,2462 ----
  		return_value = _readTableFunc();
  	else if (MATCH("VAR", 3))
  		return_value = _readVar();
+ 	else if (MATCH("GROUPEDVAR", 10))
+ 		return_value = _readGroupedVar();
  	else if (MATCH("CONST", 5))
  		return_value = _readConst();
  	else if (MATCH("PARAM", 5))
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
new file mode 100644
index fc0fca4..eee093f
*** a/src/backend/optimizer/README
--- b/src/backend/optimizer/README
*************** be desirable to postpone the Gather stag
*** 1076,1078 ****
--- 1076,1105 ----
  plan as possible.  Expanding the range of cases in which more work can be
  pushed below the Gather (and costing them accurately) is likely to keep us
  busy for a long time to come.
+ 
+ Partition-wise joins
+ --------------------
+ A join between two similarly partitioned tables can be broken down into joins
+ between their matching partitions if there exists an equi-join condition
+ between the partition keys of the joining tables. The equi-join between
+ partition keys implies that all join partners for a given row in one
+ partitioned table must be in the corresponding partition of the other
+ partitioned table. The join partners can not be found in other partitions. This
+ condition allows the join between partitioned tables to be broken into joins
+ between the matching partitions. The resultant join is partitioned in the same
+ way as the joining relations, thus allowing an N-way join between similarly
+ partitioned tables having equi-join condition between their partition keys to
+ be broken down into N-way joins between their matching partitions. This
+ technique of breaking down a join between partition tables into join between
+ their partitions is called partition-wise join. We will use term "partitioned
+ relation" for both partitioned table as well as join between partitioned tables
+ which can use partition-wise join technique.
+ 
+ Partitioning properties of a partitioned table are stored in
+ PartitionSchemeData structure. Planner maintains a list of canonical partition
+ schemes (distinct PartitionSchemeData objects) so that any two partitioned
+ relations with same partitioning scheme share the same PartitionSchemeData
+ object. This reduces memory consumed by PartitionSchemeData objects and makes
+ it easy to compare the partition schemes of joining relations. RelOptInfos of
+ partitioned relations hold partition key expressions and the RelOptInfos of
+ the partition relations of that relation.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
new file mode 100644
index b5cab0c..1ad910d
*** a/src/backend/optimizer/geqo/geqo_eval.c
--- b/src/backend/optimizer/geqo/geqo_eval.c
*************** merge_clump(PlannerInfo *root, List *clu
*** 264,271 ****
  			/* Keep searching if join order is not valid */
  			if (joinrel)
  			{
  				/* Create GatherPaths for any useful partial paths for rel */
! 				generate_gather_paths(root, joinrel);
  
  				/* Find and save the cheapest paths for this joinrel */
  				set_cheapest(joinrel);
--- 264,279 ----
  			/* Keep searching if join order is not valid */
  			if (joinrel)
  			{
+ 
+ 				/*
+ 				 * Create "append" paths for partitioned joins. Do this before
+ 				 * creating GatherPaths so that partial "append" paths in
+ 				 * partitioned joins will be considered.
+ 				 */
+ 				generate_partition_wise_join_paths(root, joinrel);
+ 
  				/* Create GatherPaths for any useful partial paths for rel */
! 				generate_gather_paths(root, joinrel, false);
  
  				/* Find and save the cheapest paths for this joinrel */
  				set_cheapest(joinrel);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
new file mode 100644
index b93b4fc..83a2c37
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***************
*** 24,29 ****
--- 24,30 ----
  #include "catalog/pg_operator.h"
  #include "catalog/pg_proc.h"
  #include "foreign/fdwapi.h"
+ #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #ifdef OPTIMIZER_DEBUG
*************** set_rel_pathlist(PlannerInfo *root, RelO
*** 486,492 ****
  	 * we'll consider gathering partial paths for the parent appendrel.)
  	 */
  	if (rel->reloptkind == RELOPT_BASEREL)
! 		generate_gather_paths(root, rel);
  
  	/*
  	 * Allow a plugin to editorialize on the set of Paths for this base
--- 487,496 ----
  	 * we'll consider gathering partial paths for the parent appendrel.)
  	 */
  	if (rel->reloptkind == RELOPT_BASEREL)
! 	{
! 		generate_gather_paths(root, rel, false);
! 		generate_gather_paths(root, rel, true);
! 	}
  
  	/*
  	 * Allow a plugin to editorialize on the set of Paths for this base
*************** static void
*** 686,691 ****
--- 690,696 ----
  set_plain_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  {
  	Relids		required_outer;
+ 	Path		*seq_path;
  
  	/*
  	 * We don't support pushing join clauses into the quals of a seqscan, but
*************** set_plain_rel_pathlist(PlannerInfo *root
*** 694,708 ****
  	 */
  	required_outer = rel->lateral_relids;
  
! 	/* Consider sequential scan */
! 	add_path(rel, create_seqscan_path(root, rel, required_outer, 0));
  
! 	/* If appropriate, consider parallel sequential scan */
  	if (rel->consider_parallel && required_outer == NULL)
  		create_plain_partial_paths(root, rel);
  
  	/* Consider index scans */
! 	create_index_paths(root, rel);
  
  	/* Consider TID scans */
  	create_tidscan_paths(root, rel);
--- 699,726 ----
  	 */
  	required_outer = rel->lateral_relids;
  
! 	/* Consider sequential scan, both plain and grouped. */
! 	seq_path = create_seqscan_path(root, rel, required_outer, 0);
! 	add_path(rel, seq_path, false);
! 	if (rel->gpi != NULL && required_outer == NULL)
! 		create_grouped_path(root, rel, seq_path, false, false, AGG_HASHED);
  
! 	/* If appropriate, consider parallel sequential scan (plain or grouped) */
  	if (rel->consider_parallel && required_outer == NULL)
  		create_plain_partial_paths(root, rel);
  
  	/* Consider index scans */
! 	create_index_paths(root, rel, false);
! 	if (rel->gpi != NULL)
! 	{
! 		/*
! 		 * TODO Instead of calling the whole clause-matching machinery twice
! 		 * (there should be no difference between plain and grouped paths from
! 		 * this point of view), consider returning a separate list of paths
! 		 * usable as grouped ones.
! 		 */
! 		create_index_paths(root, rel, true);
! 	}
  
  	/* Consider TID scans */
  	create_tidscan_paths(root, rel);
*************** static void
*** 716,721 ****
--- 734,740 ----
  create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
  {
  	int			parallel_workers;
+ 	Path		*path;
  
  	parallel_workers = compute_parallel_worker(rel, rel->pages, -1);
  
*************** create_plain_partial_paths(PlannerInfo *
*** 724,730 ****
  		return;
  
  	/* Add an unordered partial path based on a parallel sequential scan. */
! 	add_partial_path(rel, create_seqscan_path(root, rel, NULL, parallel_workers));
  }
  
  /*
--- 743,850 ----
  		return;
  
  	/* Add an unordered partial path based on a parallel sequential scan. */
! 	path = create_seqscan_path(root, rel, NULL, parallel_workers);
! 	add_partial_path(rel, path, false);
! 
! 	/*
! 	 * Do partial aggregation at base relation level if the relation is
! 	 * eligible for it.
! 	 */
! 	if (rel->gpi != NULL)
! 		create_grouped_path(root, rel, path, false, true, AGG_HASHED);
! }
! 
! /*
!  * Apply partial aggregation to a subpath and add the AggPath to the
!  * appropriate pathlist.
!  *
!  * "precheck" tells whether the aggregation path should first be checked using
!  * add_path_precheck().
!  *
!  * If "partial" is true, the resulting path is considered partial in terms of
!  * parallel execution.
!  *
!  * The path we create here shouldn't be parameterized because of supposedly
!  * high startup cost of aggregation (whether due to build of hash table for
!  * AGG_HASHED strategy or due to explicit sort for AGG_SORTED).
!  *
!  * XXX IndexPath as an input for AGG_SORTED might seem to be an exception, but
!  * aggregation of its output is only beneficial if it's performed by multiple
!  * workers, i.e. the resulting path is partial (Besides parallel aggregation,
!  * the other use case of aggregation push-down is aggregation performed on
!  * remote database, but that has nothing to do with IndexScan). And partial
!  * path cannot be parameterized because it's semantically wrong to use it on
!  * the inner side of NL join.
!  */
! void
! create_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
! 					bool precheck, bool partial, AggStrategy aggstrategy)
! {
! 	List    *group_clauses = NIL;
! 	List	*group_exprs = NIL;
! 	List	*agg_exprs = NIL;
! 	Path	*agg_path;
! 
! 	/*
! 	 * If the AggPath should be partial, the subpath must be too, and
! 	 * therefore the subpath is essentially parallel_safe.
! 	 */
! 	Assert(subpath->parallel_safe || !partial);
! 
! 	/*
! 	 * Grouped path should never be parameterized, so we're not supposed to
! 	 * receive parameterized subpath.
! 	 */
! 	Assert(subpath->param_info == NULL);
! 
! 	/*
! 	 * Note that "partial" in the following function names refers to 2-stage
! 	 * aggregation, not to parallel processing.
! 	 */
! 	if (aggstrategy == AGG_HASHED)
! 		agg_path = (Path *) create_partial_agg_hashed_path(root, subpath,
! 														   true,
! 														   &group_clauses,
! 														   &group_exprs,
! 														   &agg_exprs,
! 														   subpath->rows);
! 	else if (aggstrategy == AGG_SORTED)
! 		agg_path = (Path *) create_partial_agg_sorted_path(root, subpath,
! 														   true,
! 														   &group_clauses,
! 														   &group_exprs,
! 														   &agg_exprs,
! 														   subpath->rows);
! 	else
! 		elog(ERROR, "unexpected strategy %d", aggstrategy);
! 
! 	/* Add the grouped path to the list of grouped base paths. */
! 	if (agg_path != NULL)
! 	{
! 		if (precheck)
! 		{
! 			List	*pathkeys;
! 
! 			/* AGG_HASH is not supposed to generate sorted output. */
! 			pathkeys = aggstrategy == AGG_SORTED ? subpath->pathkeys : NIL;
! 
! 			if (!partial &&
! 				!add_path_precheck(rel, agg_path->startup_cost,
! 								   agg_path->total_cost, pathkeys, NULL,
! 								   true))
! 				return;
! 
! 			if (partial &&
! 				!add_partial_path_precheck(rel, agg_path->total_cost, pathkeys,
! 										   true))
! 				return;
! 		}
! 
! 		if (!partial)
! 			add_path(rel, (Path *) agg_path, true);
! 		else
! 			add_partial_path(rel, (Path *) agg_path, true);
! 	}
  }
  
  /*
*************** set_tablesample_rel_pathlist(PlannerInfo
*** 810,816 ****
  		path = (Path *) create_material_path(rel, path);
  	}
  
! 	add_path(rel, path);
  
  	/* For the moment, at least, there are no other paths to consider */
  }
--- 930,936 ----
  		path = (Path *) create_material_path(rel, path);
  	}
  
! 	add_path(rel, path, false);
  
  	/* For the moment, at least, there are no other paths to consider */
  }
*************** set_append_rel_size(PlannerInfo *root, R
*** 915,926 ****
  		childrel = find_base_rel(root, childRTindex);
  		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
  
  		/*
! 		 * We have to copy the parent's targetlist and quals to the child,
! 		 * with appropriate substitution of variables.  However, only the
! 		 * baserestrictinfo quals are needed before we can check for
! 		 * constraint exclusion; so do that first and then check to see if we
! 		 * can disregard this child.
  		 *
  		 * The child rel's targetlist might contain non-Var expressions, which
  		 * means that substitution into the quals could produce opportunities
--- 1035,1100 ----
  		childrel = find_base_rel(root, childRTindex);
  		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
  
+ 		if (rel->part_scheme)
+ 		{
+ 			AttrNumber		attno;
+ 
+ 			/*
+ 			 * For a partitioned tables, individual partitions can participate
+ 			 * in the pair-wise joins. We need attr_needed data for building
+ 			 * targetlists of joins between partitions.
+ 			 */
+ 			for (attno = rel->min_attr; attno <= rel->max_attr; attno++)
+ 			{
+ 				int	index = attno - rel->min_attr;
+ 				Relids	attr_needed = bms_copy(rel->attr_needed[index]);
+ 
+ 				/* System attributes do not need translation. */
+ 				if (attno <= 0)
+ 				{
+ 					Assert(rel->min_attr == childrel->min_attr);
+ 					childrel->attr_needed[index] = attr_needed;
+ 				}
+ 				else
+ 				{
+ 					Var *var = list_nth(appinfo->translated_vars,
+ 										attno - 1);
+ 					int child_index;
+ 
+ 					/*
+ 					 * Parent Var for a user defined attribute translates to
+ 					 * child Var.
+ 					 */
+ 					Assert(IsA(var, Var));
+ 
+ 					child_index = var->varattno - childrel->min_attr;
+ 					childrel->attr_needed[child_index] = attr_needed;
+ 				}
+ 			}
+ 		}
+ 
  		/*
! 		 * Copy/Modify targetlist. Even if this child is deemed empty, we need
! 		 * its targetlist in case it falls on nullable side in a child-join
! 		 * because of partition-wise join.
! 		 *
! 		 * NB: the resulting childrel->reltarget->exprs may contain arbitrary
! 		 * expressions, which otherwise would not occur in a rel's targetlist.
! 		 * Code that might be looking at an appendrel child must cope with
! 		 * such.  (Normally, a rel's targetlist would only include Vars and
! 		 * PlaceHolderVars.)  XXX we do not bother to update the cost or width
! 		 * fields of childrel->reltarget; not clear if that would be useful.
! 		 */
! 		childrel->reltarget->exprs = (List *)
! 			adjust_appendrel_attrs(root,
! 								   (Node *) rel->reltarget->exprs,
! 								   1, &appinfo);
! 
! 		/*
! 		 * We have to copy the parent's quals to the child, with appropriate
! 		 * substitution of variables.  However, only the baserestrictinfo quals
! 		 * are needed before we can check for constraint exclusion; so do that
! 		 * first and then check to see if we can disregard this child.
  		 *
  		 * The child rel's targetlist might contain non-Var expressions, which
  		 * means that substitution into the quals could produce opportunities
*************** set_append_rel_size(PlannerInfo *root, R
*** 941,947 ****
  			Assert(IsA(rinfo, RestrictInfo));
  			childqual = adjust_appendrel_attrs(root,
  											   (Node *) rinfo->clause,
! 											   appinfo);
  			childqual = eval_const_expressions(root, childqual);
  			/* check for flat-out constant */
  			if (childqual && IsA(childqual, Const))
--- 1115,1121 ----
  			Assert(IsA(rinfo, RestrictInfo));
  			childqual = adjust_appendrel_attrs(root,
  											   (Node *) rinfo->clause,
! 											   1, &appinfo);
  			childqual = eval_const_expressions(root, childqual);
  			/* check for flat-out constant */
  			if (childqual && IsA(childqual, Const))
*************** set_append_rel_size(PlannerInfo *root, R
*** 1047,1070 ****
  			continue;
  		}
  
! 		/*
! 		 * CE failed, so finish copying/modifying targetlist and join quals.
! 		 *
! 		 * NB: the resulting childrel->reltarget->exprs may contain arbitrary
! 		 * expressions, which otherwise would not occur in a rel's targetlist.
! 		 * Code that might be looking at an appendrel child must cope with
! 		 * such.  (Normally, a rel's targetlist would only include Vars and
! 		 * PlaceHolderVars.)  XXX we do not bother to update the cost or width
! 		 * fields of childrel->reltarget; not clear if that would be useful.
! 		 */
  		childrel->joininfo = (List *)
  			adjust_appendrel_attrs(root,
  								   (Node *) rel->joininfo,
! 								   appinfo);
! 		childrel->reltarget->exprs = (List *)
! 			adjust_appendrel_attrs(root,
! 								   (Node *) rel->reltarget->exprs,
! 								   appinfo);
  
  		/*
  		 * We have to make child entries in the EquivalenceClass data
--- 1221,1231 ----
  			continue;
  		}
  
! 		/* CE failed, so finish copying/modifying join quals. */
  		childrel->joininfo = (List *)
  			adjust_appendrel_attrs(root,
  								   (Node *) rel->joininfo,
! 								   1, &appinfo);
  
  		/*
  		 * We have to make child entries in the EquivalenceClass data
*************** set_append_rel_size(PlannerInfo *root, R
*** 1079,1092 ****
  		childrel->has_eclass_joins = rel->has_eclass_joins;
  
  		/*
- 		 * Note: we could compute appropriate attr_needed data for the child's
- 		 * variables, by transforming the parent's attr_needed through the
- 		 * translated_vars mapping.  However, currently there's no need
- 		 * because attr_needed is only examined for base relations not
- 		 * otherrels.  So we just leave the child's attr_needed empty.
- 		 */
- 
- 		/*
  		 * If parallelism is allowable for this query in general, see whether
  		 * it's allowable for this childrel in particular.  But if we've
  		 * already decided the appendrel is not parallel-safe as a whole,
--- 1240,1245 ----
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1281,1299 ****
  	bool		subpaths_valid = true;
  	List	   *partial_subpaths = NIL;
  	bool		partial_subpaths_valid = true;
  	List	   *all_child_pathkeys = NIL;
  	List	   *all_child_outers = NIL;
  	ListCell   *l;
  	List	   *partitioned_rels = NIL;
  	RangeTblEntry *rte;
  
! 	rte = planner_rt_fetch(rel->relid, root);
! 	if (rte->relkind == RELKIND_PARTITIONED_TABLE)
  	{
! 		partitioned_rels = get_partitioned_child_rels(root, rel->relid);
! 		/* The root partitioned table is included as a child rel */
! 		Assert(list_length(partitioned_rels) >= 1);
  	}
  
  	/*
  	 * For every non-dummy child, remember the cheapest path.  Also, identify
--- 1434,1460 ----
  	bool		subpaths_valid = true;
  	List	   *partial_subpaths = NIL;
  	bool		partial_subpaths_valid = true;
+ 	List	   *grouped_subpaths = NIL;
+ 	bool		grouped_subpaths_valid = true;
  	List	   *all_child_pathkeys = NIL;
  	List	   *all_child_outers = NIL;
  	ListCell   *l;
  	List	   *partitioned_rels = NIL;
  	RangeTblEntry *rte;
  
! 	if (rel->reloptkind == RELOPT_BASEREL)
  	{
! 		rte = planner_rt_fetch(rel->relid, root);
! 
! 		if (rte->relkind == RELKIND_PARTITIONED_TABLE)
! 		{
! 			partitioned_rels = get_partitioned_child_rels(root, rel->relid);
! 			/* The root partitioned table is included as a child rel */
! 			Assert(list_length(partitioned_rels) >= 1);
! 		}
  	}
+ 	else if (rel->reloptkind == RELOPT_JOINREL && rel->part_scheme)
+ 		partitioned_rels = get_partitioned_child_rels_for_join(root, rel);
  
  	/*
  	 * For every non-dummy child, remember the cheapest path.  Also, identify
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1324,1329 ****
--- 1485,1521 ----
  			partial_subpaths_valid = false;
  
  		/*
+ 		 * For grouped paths, use only the unparameterized subpaths.
+ 		 *
+ 		 * XXX Consider if the parameterized subpaths should be processed
+ 		 * below. It's probably not useful for sequential scans (due to
+ 		 * repeated aggregation), but might be worthwhile for other child
+ 		 * nodes.
+ 		 */
+ 		if (childrel->gpi != NULL && childrel->gpi->pathlist != NIL)
+ 		{
+ 			Path	*path;
+ 
+ 			path = (Path *) linitial(childrel->gpi->pathlist);
+ 
+ 			/*
+ 			 * PoC only: Simulate remote aggregation, which seems to be the
+ 			 * typical use case for pushing the aggregation below Append node.
+ 			 */
+ 			path->startup_cost = 0.0;
+ 			path->total_cost = 0.0;
+ 
+ 			if (path->param_info == NULL)
+ 				grouped_subpaths = accumulate_append_subpath(grouped_subpaths,
+ 															 path);
+ 			else
+ 				grouped_subpaths_valid = false;
+ 		}
+ 		else
+ 			grouped_subpaths_valid = false;
+ 
+ 
+ 		/*
  		 * Collect lists of all the available path orderings and
  		 * parameterizations for all the children.  We use these as a
  		 * heuristic to indicate which sort orderings and parameterizations we
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1395,1401 ****
  	 */
  	if (subpaths_valid)
  		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
! 												  partitioned_rels));
  
  	/*
  	 * Consider an append of partial unordered, unparameterized partial paths.
--- 1587,1594 ----
  	 */
  	if (subpaths_valid)
  		add_path(rel, (Path *) create_append_path(rel, subpaths, NULL, 0,
! 					 partitioned_rels),
! 				 false);
  
  	/*
  	 * Consider an append of partial unordered, unparameterized partial paths.
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1422,1429 ****
  
  		/* Generate a partial append path. */
  		appendpath = create_append_path(rel, partial_subpaths, NULL,
! 										parallel_workers, partitioned_rels);
! 		add_partial_path(rel, (Path *) appendpath);
  	}
  
  	/*
--- 1615,1635 ----
  
  		/* Generate a partial append path. */
  		appendpath = create_append_path(rel, partial_subpaths, NULL,
! 										parallel_workers,
! 										partitioned_rels);
! 		add_partial_path(rel, (Path *) appendpath, false);
! 	}
! 
! 	/* TODO Also partial grouped paths? */
! 	if (grouped_subpaths_valid)
! 	{
! 		Path	*path;
! 
! 		path = (Path *) create_append_path(rel, grouped_subpaths, NULL, 0,
! 			partitioned_rels);
! 		/* pathtarget will produce the grouped relation.. */
! 		path->pathtarget = rel->gpi->target;
! 		add_path(rel, path, true);
  	}
  
  	/*
*************** add_paths_to_append_rel(PlannerInfo *roo
*** 1476,1482 ****
  		if (subpaths_valid)
  			add_path(rel, (Path *)
  					 create_append_path(rel, subpaths, required_outer, 0,
! 										partitioned_rels));
  	}
  }
  
--- 1682,1689 ----
  		if (subpaths_valid)
  			add_path(rel, (Path *)
  					 create_append_path(rel, subpaths, required_outer, 0,
! 						 partitioned_rels),
! 					 false);
  	}
  }
  
*************** generate_mergeappend_paths(PlannerInfo *
*** 1572,1585 ****
  														startup_subpaths,
  														pathkeys,
  														NULL,
! 														partitioned_rels));
  		if (startup_neq_total)
  			add_path(rel, (Path *) create_merge_append_path(root,
  															rel,
  															total_subpaths,
  															pathkeys,
  															NULL,
! 															partitioned_rels));
  	}
  }
  
--- 1779,1794 ----
  														startup_subpaths,
  														pathkeys,
  														NULL,
! 														partitioned_rels),
! 				 false);
  		if (startup_neq_total)
  			add_path(rel, (Path *) create_merge_append_path(root,
  															rel,
  															total_subpaths,
  															pathkeys,
  															NULL,
! 															partitioned_rels),
! 					 false);
  	}
  }
  
*************** set_dummy_rel_pathlist(RelOptInfo *rel)
*** 1712,1718 ****
  	rel->pathlist = NIL;
  	rel->partial_pathlist = NIL;
  
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
  
  	/*
  	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
--- 1921,1927 ----
  	rel->pathlist = NIL;
  	rel->partial_pathlist = NIL;
  
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL), false);
  
  	/*
  	 * We set the cheapest path immediately, to ensure that IS_DUMMY_REL()
*************** set_subquery_pathlist(PlannerInfo *root,
*** 1926,1932 ****
  		/* Generate outer path using this subpath */
  		add_path(rel, (Path *)
  				 create_subqueryscan_path(root, rel, subpath,
! 										  pathkeys, required_outer));
  	}
  }
  
--- 2135,2141 ----
  		/* Generate outer path using this subpath */
  		add_path(rel, (Path *)
  				 create_subqueryscan_path(root, rel, subpath,
! 										  pathkeys, required_outer), false);
  	}
  }
  
*************** set_function_pathlist(PlannerInfo *root,
*** 1995,2001 ****
  
  	/* Generate appropriate path */
  	add_path(rel, create_functionscan_path(root, rel,
! 										   pathkeys, required_outer));
  }
  
  /*
--- 2204,2210 ----
  
  	/* Generate appropriate path */
  	add_path(rel, create_functionscan_path(root, rel,
! 										   pathkeys, required_outer), false);
  }
  
  /*
*************** set_values_pathlist(PlannerInfo *root, R
*** 2015,2021 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_valuesscan_path(root, rel, required_outer));
  }
  
  /*
--- 2224,2230 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_valuesscan_path(root, rel, required_outer), false);
  }
  
  /*
*************** set_tablefunc_pathlist(PlannerInfo *root
*** 2036,2042 ****
  
  	/* Generate appropriate path */
  	add_path(rel, create_tablefuncscan_path(root, rel,
! 											required_outer));
  }
  
  /*
--- 2245,2251 ----
  
  	/* Generate appropriate path */
  	add_path(rel, create_tablefuncscan_path(root, rel,
! 											required_outer), false);
  }
  
  /*
*************** set_cte_pathlist(PlannerInfo *root, RelO
*** 2102,2108 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_ctescan_path(root, rel, required_outer));
  }
  
  /*
--- 2311,2317 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_ctescan_path(root, rel, required_outer), false);
  }
  
  /*
*************** set_namedtuplestore_pathlist(PlannerInfo
*** 2129,2135 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_namedtuplestorescan_path(root, rel, required_outer));
  
  	/* Select cheapest path (pretty easy in this case...) */
  	set_cheapest(rel);
--- 2338,2345 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_namedtuplestorescan_path(root, rel, required_outer),
! 			 false);
  
  	/* Select cheapest path (pretty easy in this case...) */
  	set_cheapest(rel);
*************** set_worktable_pathlist(PlannerInfo *root
*** 2182,2188 ****
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_worktablescan_path(root, rel, required_outer));
  }
  
  /*
--- 2392,2399 ----
  	required_outer = rel->lateral_relids;
  
  	/* Generate appropriate path */
! 	add_path(rel, create_worktablescan_path(root, rel, required_outer),
! 			 false);
  }
  
  /*
*************** set_worktable_pathlist(PlannerInfo *root
*** 2195,2208 ****
   * path that some GatherPath or GatherMergePath has a reference to.)
   */
  void
! generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
  {
  	Path	   *cheapest_partial_path;
  	Path	   *simple_gather_path;
  	ListCell   *lc;
  
  	/* If there are no partial paths, there's nothing to do here. */
! 	if (rel->partial_pathlist == NIL)
  		return;
  
  	/*
--- 2406,2426 ----
   * path that some GatherPath or GatherMergePath has a reference to.)
   */
  void
! generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool grouped)
  {
  	Path	   *cheapest_partial_path;
  	Path	   *simple_gather_path;
+ 	List	   *pathlist = NIL;
+ 	PathTarget *partial_target;
  	ListCell   *lc;
  
+ 	if (!grouped)
+ 		pathlist = rel->partial_pathlist;
+ 	else if (rel->gpi != NULL)
+ 		pathlist = rel->gpi->partial_pathlist;
+ 
  	/* If there are no partial paths, there's nothing to do here. */
! 	if (pathlist == NIL)
  		return;
  
  	/*
*************** generate_gather_paths(PlannerInfo *root,
*** 2210,2226 ****
  	 * path of interest: the cheapest one.  That will be the one at the front
  	 * of partial_pathlist because of the way add_partial_path works.
  	 */
! 	cheapest_partial_path = linitial(rel->partial_pathlist);
  	simple_gather_path = (Path *)
! 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
  						   NULL, NULL);
! 	add_path(rel, simple_gather_path);
  
  	/*
  	 * For each useful ordering, we can consider an order-preserving Gather
  	 * Merge.
  	 */
! 	foreach (lc, rel->partial_pathlist)
  	{
  		Path   *subpath = (Path *) lfirst(lc);
  		GatherMergePath   *path;
--- 2428,2450 ----
  	 * path of interest: the cheapest one.  That will be the one at the front
  	 * of partial_pathlist because of the way add_partial_path works.
  	 */
! 	cheapest_partial_path = linitial(pathlist);
! 
! 	if (!grouped)
! 		partial_target = rel->reltarget;
! 	else if (rel->gpi != NULL)
! 		partial_target = rel->gpi->target;
! 
  	simple_gather_path = (Path *)
! 		create_gather_path(root, rel, cheapest_partial_path, partial_target,
  						   NULL, NULL);
! 	add_path(rel, simple_gather_path, grouped);
  
  	/*
  	 * For each useful ordering, we can consider an order-preserving Gather
  	 * Merge.
  	 */
! 	foreach (lc, pathlist)
  	{
  		Path   *subpath = (Path *) lfirst(lc);
  		GatherMergePath   *path;
*************** generate_gather_paths(PlannerInfo *root,
*** 2228,2236 ****
  		if (subpath->pathkeys == NIL)
  			continue;
  
! 		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
  										subpath->pathkeys, NULL, NULL);
! 		add_path(rel, &path->path);
  	}
  }
  
--- 2452,2460 ----
  		if (subpath->pathkeys == NIL)
  			continue;
  
! 		path = create_gather_merge_path(root, rel, subpath, partial_target,
  										subpath->pathkeys, NULL, NULL);
! 		add_path(rel, &path->path, grouped);
  	}
  }
  
*************** standard_join_search(PlannerInfo *root,
*** 2388,2402 ****
  		 * Run generate_gather_paths() for each just-processed joinrel.  We
  		 * could not do this earlier because both regular and partial paths
  		 * can get added to a particular joinrel at multiple times within
! 		 * join_search_one_level.  After that, we're done creating paths for
! 		 * the joinrel, so run set_cheapest().
  		 */
  		foreach(lc, root->join_rel_level[lev])
  		{
  			rel = (RelOptInfo *) lfirst(lc);
  
  			/* Create GatherPaths for any useful partial paths for rel */
! 			generate_gather_paths(root, rel);
  
  			/* Find and save the cheapest paths for this rel */
  			set_cheapest(rel);
--- 2612,2641 ----
  		 * Run generate_gather_paths() for each just-processed joinrel.  We
  		 * could not do this earlier because both regular and partial paths
  		 * can get added to a particular joinrel at multiple times within
! 		 * join_search_one_level.
! 		 *
! 		 * Similarly, create paths for joinrels which used partition-wise join
! 		 * technique. We could not do this earlier because paths can get added
! 		 * to a particular child-join at multiple times within
! 		 * join_search_one_level.
! 		 *
! 		 * After that, we're done creating paths for the joinrel, so run
! 		 * set_cheapest().
  		 */
  		foreach(lc, root->join_rel_level[lev])
  		{
  			rel = (RelOptInfo *) lfirst(lc);
  
+ 			/*
+ 			 * Create paths for partition-wise joins. Do this before creating
+ 			 * GatherPaths so that partial "append" paths in partitioned joins
+ 			 * will be considered.
+ 			 */
+ 			generate_partition_wise_join_paths(root, rel);
+ 
  			/* Create GatherPaths for any useful partial paths for rel */
! 			generate_gather_paths(root, rel, false);
! 			generate_gather_paths(root, rel, true);
  
  			/* Find and save the cheapest paths for this rel */
  			set_cheapest(rel);
*************** create_partial_bitmap_paths(PlannerInfo
*** 3047,3053 ****
  		return;
  
  	add_partial_path(rel, (Path *) create_bitmap_heap_path(root, rel,
! 					bitmapqual, rel->lateral_relids, 1.0, parallel_workers));
  }
  
  /*
--- 3286,3292 ----
  		return;
  
  	add_partial_path(rel, (Path *) create_bitmap_heap_path(root, rel,
! 				   bitmapqual, rel->lateral_relids, 1.0, parallel_workers), false);
  }
  
  /*
*************** compute_parallel_worker(RelOptInfo *rel,
*** 3142,3147 ****
--- 3381,3454 ----
  	return parallel_workers;
  }
  
+ /*
+  * generate_partition_wise_join_paths
+  *
+  * 		Create paths representing partition-wise join for given partitioned
+  * 		join relation.
+  *
+  * This must not be called until after we are done adding paths for all
+  * child-joins. (Otherwise, add_path might delete a path that some "append"
+  * path has reference to.
+  */
+ void
+ generate_partition_wise_join_paths(PlannerInfo *root, RelOptInfo *rel)
+ {
+ 	List   *live_children = NIL;
+ 	int		cnt_parts;
+ 	int		num_parts;
+ 	RelOptInfo	   **part_rels;
+ 
+ 	/* Handle only join relations. */
+ 	if (!IS_JOIN_REL(rel))
+ 		return;
+ 
+ 	/* If the relation is not partitioned or is proven dummy, nothing to do. */
+ 	if (!rel->part_scheme || !rel->boundinfo || IS_DUMMY_REL(rel))
+ 		return;
+ 
+ 	/* A partitioned join should have RelOptInfos of the child-joins. */
+ 	Assert(rel->part_rels && rel->nparts > 0);
+ 
+ 	/* Guard against stack overflow due to overly deep partition hierarchy. */
+ 	check_stack_depth();
+ 
+ 	num_parts = rel->nparts;
+ 	part_rels = rel->part_rels;
+ 
+    /* Collect non-dummy child-joins. */
+ 	for (cnt_parts = 0; cnt_parts < num_parts; cnt_parts++)
+ 	{
+ 		RelOptInfo *child_rel = part_rels[cnt_parts];
+ 
+ 		/* Add partition-wise join paths for partitioned child-joins. */
+ 		generate_partition_wise_join_paths(root, child_rel);
+ 
+ 		/* Dummy children will not be scanned, so ingore those. */
+ 		if (IS_DUMMY_REL(child_rel))
+ 			continue;
+ 
+ 		set_cheapest(child_rel);
+ 
+ #ifdef OPTIMIZER_DEBUG
+ 		debug_print_rel(root, rel);
+ #endif
+ 
+ 		live_children = lappend(live_children, child_rel);
+ 	}
+ 
+ 	/* If all child-joins are dummy, parent join is also dummy. */
+ 	if (!live_children)
+ 	{
+ 		mark_dummy_rel(rel);
+ 		return;
+ 	}
+ 
+ 	/* Add "append" paths containing paths from child-joins. */
+ 	add_paths_to_append_rel(root, rel, live_children);
+ 	list_free(live_children);
+ }
+ 
  
  /*****************************************************************************
   *			DEBUG SUPPORT
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
new file mode 100644
index 52643d0..f278b77
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** bool		enable_material = true;
*** 127,132 ****
--- 127,133 ----
  bool		enable_mergejoin = true;
  bool		enable_hashjoin = true;
  bool		enable_gathermerge = true;
+ bool		enable_partition_wise_join = false;
  
  typedef struct
  {
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
new file mode 100644
index 67bd760..780ea04
*** a/src/backend/optimizer/path/equivclass.c
--- b/src/backend/optimizer/path/equivclass.c
*************** generate_join_implied_equalities_broken(
*** 1329,1335 ****
  	if (IS_OTHER_REL(inner_rel) && result != NIL)
  		result = (List *) adjust_appendrel_attrs_multilevel(root,
  															(Node *) result,
! 															inner_rel);
  
  	return result;
  }
--- 1329,1336 ----
  	if (IS_OTHER_REL(inner_rel) && result != NIL)
  		result = (List *) adjust_appendrel_attrs_multilevel(root,
  															(Node *) result,
! 															inner_rel->relids,
! 												 inner_rel->top_parent_relids);
  
  	return result;
  }
*************** add_child_rel_equivalences(PlannerInfo *
*** 2112,2118 ****
  				child_expr = (Expr *)
  					adjust_appendrel_attrs(root,
  										   (Node *) cur_em->em_expr,
! 										   appinfo);
  
  				/*
  				 * Transform em_relids to match.  Note we do *not* do
--- 2113,2119 ----
  				child_expr = (Expr *)
  					adjust_appendrel_attrs(root,
  										   (Node *) cur_em->em_expr,
! 										   1, &appinfo);
  
  				/*
  				 * Transform em_relids to match.  Note we do *not* do
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
new file mode 100644
index 6e4bae8..a6fa713
*** a/src/backend/optimizer/path/indxpath.c
--- b/src/backend/optimizer/path/indxpath.c
***************
*** 32,37 ****
--- 32,38 ----
  #include "optimizer/predtest.h"
  #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
+ #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "utils/builtins.h"
  #include "utils/bytea.h"
*************** static bool eclass_already_used(Equivale
*** 107,119 ****
  static bool bms_equal_any(Relids relids, List *relids_list);
  static void get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths);
  static List *build_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				  IndexOptInfo *index, IndexClauseSet *clauses,
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				  bool *skip_lower_saop);
  static List *build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
  				   List *clauses, List *other_clauses);
  static List *generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel,
--- 108,121 ----
  static bool bms_equal_any(Relids relids, List *relids_list);
  static void get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths, bool grouped);
  static List *build_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				  IndexOptInfo *index, IndexClauseSet *clauses,
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				   bool *skip_lower_saop,
! 				   bool grouped);
  static List *build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
  				   List *clauses, List *other_clauses);
  static List *generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo *rel,
*************** static Const *string_to_const(const char
*** 229,235 ****
   * as meaning "unparameterized so far as the indexquals are concerned".
   */
  void
! create_index_paths(PlannerInfo *root, RelOptInfo *rel)
  {
  	List	   *indexpaths;
  	List	   *bitindexpaths;
--- 231,237 ----
   * as meaning "unparameterized so far as the indexquals are concerned".
   */
  void
! create_index_paths(PlannerInfo *root, RelOptInfo *rel, bool grouped)
  {
  	List	   *indexpaths;
  	List	   *bitindexpaths;
*************** create_index_paths(PlannerInfo *root, Re
*** 274,281 ****
  		 * non-parameterized paths.  Plain paths go directly to add_path(),
  		 * bitmap paths are added to bitindexpaths to be handled below.
  		 */
! 		get_index_paths(root, rel, index, &rclauseset,
! 						&bitindexpaths);
  
  		/*
  		 * Identify the join clauses that can match the index.  For the moment
--- 276,283 ----
  		 * non-parameterized paths.  Plain paths go directly to add_path(),
  		 * bitmap paths are added to bitindexpaths to be handled below.
  		 */
! 		get_index_paths(root, rel, index, &rclauseset, &bitindexpaths,
! 						grouped);
  
  		/*
  		 * Identify the join clauses that can match the index.  For the moment
*************** create_index_paths(PlannerInfo *root, Re
*** 338,344 ****
  		bitmapqual = choose_bitmap_and(root, rel, bitindexpaths);
  		bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  										rel->lateral_relids, 1.0, 0);
! 		add_path(rel, (Path *) bpath);
  
  		/* create a partial bitmap heap path */
  		if (rel->consider_parallel && rel->lateral_relids == NULL)
--- 340,346 ----
  		bitmapqual = choose_bitmap_and(root, rel, bitindexpaths);
  		bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  										rel->lateral_relids, 1.0, 0);
! 		add_path(rel, (Path *) bpath, false);
  
  		/* create a partial bitmap heap path */
  		if (rel->consider_parallel && rel->lateral_relids == NULL)
*************** create_index_paths(PlannerInfo *root, Re
*** 415,421 ****
  			loop_count = get_loop_count(root, rel->relid, required_outer);
  			bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  											required_outer, loop_count, 0);
! 			add_path(rel, (Path *) bpath);
  		}
  	}
  }
--- 417,423 ----
  			loop_count = get_loop_count(root, rel->relid, required_outer);
  			bpath = create_bitmap_heap_path(root, rel, bitmapqual,
  											required_outer, loop_count, 0);
! 			add_path(rel, (Path *) bpath, false);
  		}
  	}
  }
*************** get_join_index_paths(PlannerInfo *root,
*** 667,673 ****
  	Assert(clauseset.nonempty);
  
  	/* Build index path(s) using the collected set of clauses */
! 	get_index_paths(root, rel, index, &clauseset, bitindexpaths);
  
  	/*
  	 * Remember we considered paths for this set of relids.  We use lcons not
--- 669,675 ----
  	Assert(clauseset.nonempty);
  
  	/* Build index path(s) using the collected set of clauses */
! 	get_index_paths(root, rel, index, &clauseset, bitindexpaths, false);
  
  	/*
  	 * Remember we considered paths for this set of relids.  We use lcons not
*************** bms_equal_any(Relids relids, List *relid
*** 736,742 ****
  static void
  get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths)
  {
  	List	   *indexpaths;
  	bool		skip_nonnative_saop = false;
--- 738,744 ----
  static void
  get_index_paths(PlannerInfo *root, RelOptInfo *rel,
  				IndexOptInfo *index, IndexClauseSet *clauses,
! 				List **bitindexpaths, bool grouped)
  {
  	List	   *indexpaths;
  	bool		skip_nonnative_saop = false;
*************** get_index_paths(PlannerInfo *root, RelOp
*** 754,760 ****
  								   index->predOK,
  								   ST_ANYSCAN,
  								   &skip_nonnative_saop,
! 								   &skip_lower_saop);
  
  	/*
  	 * If we skipped any lower-order ScalarArrayOpExprs on an index with an AM
--- 756,762 ----
  								   index->predOK,
  								   ST_ANYSCAN,
  								   &skip_nonnative_saop,
! 								   &skip_lower_saop, grouped);
  
  	/*
  	 * If we skipped any lower-order ScalarArrayOpExprs on an index with an AM
*************** get_index_paths(PlannerInfo *root, RelOp
*** 769,775 ****
  												   index->predOK,
  												   ST_ANYSCAN,
  												   &skip_nonnative_saop,
! 												   NULL));
  	}
  
  	/*
--- 771,777 ----
  												   index->predOK,
  												   ST_ANYSCAN,
  												   &skip_nonnative_saop,
! 												   NULL, grouped));
  	}
  
  	/*
*************** get_index_paths(PlannerInfo *root, RelOp
*** 789,797 ****
  		IndexPath  *ipath = (IndexPath *) lfirst(lc);
  
  		if (index->amhasgettuple)
! 			add_path(rel, (Path *) ipath);
  
! 		if (index->amhasgetbitmap &&
  			(ipath->path.pathkeys == NIL ||
  			 ipath->indexselectivity < 1.0))
  			*bitindexpaths = lappend(*bitindexpaths, ipath);
--- 791,799 ----
  		IndexPath  *ipath = (IndexPath *) lfirst(lc);
  
  		if (index->amhasgettuple)
! 			add_path(rel, (Path *) ipath, grouped);
  
! 		if (!grouped && index->amhasgetbitmap &&
  			(ipath->path.pathkeys == NIL ||
  			 ipath->indexselectivity < 1.0))
  			*bitindexpaths = lappend(*bitindexpaths, ipath);
*************** get_index_paths(PlannerInfo *root, RelOp
*** 802,815 ****
  	 * natively, generate bitmap scan paths relying on executor-managed
  	 * ScalarArrayOpExpr.
  	 */
! 	if (skip_nonnative_saop)
  	{
  		indexpaths = build_index_paths(root, rel,
  									   index, clauses,
  									   false,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL);
  		*bitindexpaths = list_concat(*bitindexpaths, indexpaths);
  	}
  }
--- 804,818 ----
  	 * natively, generate bitmap scan paths relying on executor-managed
  	 * ScalarArrayOpExpr.
  	 */
! 	if (!grouped && skip_nonnative_saop)
  	{
  		indexpaths = build_index_paths(root, rel,
  									   index, clauses,
  									   false,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL,
! 									   false);
  		*bitindexpaths = list_concat(*bitindexpaths, indexpaths);
  	}
  }
*************** build_index_paths(PlannerInfo *root, Rel
*** 861,867 ****
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				  bool *skip_lower_saop)
  {
  	List	   *result = NIL;
  	IndexPath  *ipath;
--- 864,870 ----
  				  bool useful_predicate,
  				  ScanTypeControl scantype,
  				  bool *skip_nonnative_saop,
! 				  bool *skip_lower_saop, bool grouped)
  {
  	List	   *result = NIL;
  	IndexPath  *ipath;
*************** build_index_paths(PlannerInfo *root, Rel
*** 878,883 ****
--- 881,890 ----
  	bool		index_is_ordered;
  	bool		index_only_scan;
  	int			indexcol;
+ 	bool		can_agg_sorted;
+ 	List		*group_clauses, *group_exprs, *agg_exprs;
+ 	AggPath		*agg_path;
+ 	double		agg_input_rows;
  
  	/*
  	 * Check that index supports the desired scan type(s)
*************** build_index_paths(PlannerInfo *root, Rel
*** 891,896 ****
--- 898,906 ----
  		case ST_BITMAPSCAN:
  			if (!index->amhasgetbitmap)
  				return NIL;
+ 
+ 			if (grouped)
+ 				return NIL;
  			break;
  		case ST_ANYSCAN:
  			/* either or both are OK */
*************** build_index_paths(PlannerInfo *root, Rel
*** 1032,1037 ****
--- 1042,1051 ----
  	 * later merging or final output ordering, OR the index has a useful
  	 * predicate, OR an index-only scan is possible.
  	 */
+ 	can_agg_sorted = true;
+ 	group_clauses = NIL;
+ 	group_exprs = NIL;
+ 	agg_exprs = NIL;
  	if (index_clauses != NIL || useful_pathkeys != NIL || useful_predicate ||
  		index_only_scan)
  	{
*************** build_index_paths(PlannerInfo *root, Rel
*** 1048,1054 ****
  								  outer_relids,
  								  loop_count,
  								  false);
! 		result = lappend(result, ipath);
  
  		/*
  		 * If appropriate, consider parallel index scan.  We don't allow
--- 1062,1086 ----
  								  outer_relids,
  								  loop_count,
  								  false);
! 		if (!grouped)
! 			result = lappend(result, ipath);
! 		else
! 		{
! 			/* TODO Double-check if this is the correct input value. */
! 			agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 			agg_path = create_partial_agg_sorted_path(root, (Path *) ipath,
! 													  true,
! 													  &group_clauses,
! 													  &group_exprs,
! 													  &agg_exprs,
! 													  agg_input_rows);
! 
! 			if (agg_path != NULL)
! 				result = lappend(result, agg_path);
! 			else
! 				can_agg_sorted = false;
! 		}
  
  		/*
  		 * If appropriate, consider parallel index scan.  We don't allow
*************** build_index_paths(PlannerInfo *root, Rel
*** 1077,1083 ****
  			 * using parallel workers, just free it.
  			 */
  			if (ipath->path.parallel_workers > 0)
! 				add_partial_path(rel, (Path *) ipath);
  			else
  				pfree(ipath);
  		}
--- 1109,1139 ----
  			 * using parallel workers, just free it.
  			 */
  			if (ipath->path.parallel_workers > 0)
! 			{
! 				if (!grouped)
! 					add_partial_path(rel, (Path *) ipath, grouped);
! 				else if (can_agg_sorted && outer_relids == NULL)
! 				{
! 					/* TODO Double-check if this is the correct input value. */
! 					agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 					agg_path = create_partial_agg_sorted_path(root,
! 															  (Path *) ipath,
! 															  false,
! 															  &group_clauses,
! 															  &group_exprs,
! 															  &agg_exprs,
! 															  agg_input_rows);
! 
! 					/*
! 					 * If create_agg_sorted_path succeeded once, it should
! 					 * always do.
! 					 */
! 					Assert(agg_path != NULL);
! 
! 					add_partial_path(rel, (Path *) agg_path, grouped);
! 				}
! 			}
  			else
  				pfree(ipath);
  		}
*************** build_index_paths(PlannerInfo *root, Rel
*** 1105,1111 ****
  									  outer_relids,
  									  loop_count,
  									  false);
! 			result = lappend(result, ipath);
  
  			/* If appropriate, consider parallel index scan */
  			if (index->amcanparallel &&
--- 1161,1185 ----
  									  outer_relids,
  									  loop_count,
  									  false);
! 
! 			if (!grouped)
! 				result = lappend(result, ipath);
! 			else if (can_agg_sorted)
! 			{
! 				/* TODO Double-check if this is the correct input value. */
! 				agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 				agg_path = create_partial_agg_sorted_path(root,
! 														  (Path *) ipath,
! 														  true,
! 														  &group_clauses,
! 														  &group_exprs,
! 														  &agg_exprs,
! 														  agg_input_rows);
! 
! 				Assert(agg_path != NULL);
! 				result = lappend(result, agg_path);
! 			}
  
  			/* If appropriate, consider parallel index scan */
  			if (index->amcanparallel &&
*************** build_index_paths(PlannerInfo *root, Rel
*** 1129,1135 ****
  				 * using parallel workers, just free it.
  				 */
  				if (ipath->path.parallel_workers > 0)
! 					add_partial_path(rel, (Path *) ipath);
  				else
  					pfree(ipath);
  			}
--- 1203,1227 ----
  				 * using parallel workers, just free it.
  				 */
  				if (ipath->path.parallel_workers > 0)
! 				{
! 					if (!grouped)
! 						add_partial_path(rel, (Path *) ipath, grouped);
! 					else if (can_agg_sorted && outer_relids == NULL)
! 					{
! 						/* TODO Double-check if this is the correct input value. */
! 						agg_input_rows =  rel->rows * ipath->indexselectivity;
! 
! 						agg_path = create_partial_agg_sorted_path(root,
! 																  (Path *) ipath,
! 																  false,
! 																  &group_clauses,
! 																  &group_exprs,
! 																  &agg_exprs,
! 																  agg_input_rows);
! 						Assert(agg_path != NULL);
! 						add_partial_path(rel, (Path *) agg_path, grouped);
! 					}
! 				}
  				else
  					pfree(ipath);
  			}
*************** build_paths_for_OR(PlannerInfo *root, Re
*** 1244,1250 ****
  									   useful_predicate,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL);
  		result = list_concat(result, indexpaths);
  	}
  
--- 1336,1343 ----
  									   useful_predicate,
  									   ST_BITMAPSCAN,
  									   NULL,
! 									   NULL,
! 									   false);
  		result = list_concat(result, indexpaths);
  	}
  
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
new file mode 100644
index 5aedcd1..f25719f
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***************
*** 22,34 ****
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
  
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
! #define PATH_PARAM_BY_REL(path, rel)  \
  	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
  
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
--- 22,45 ----
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
+ #include "optimizer/tlist.h"
  
  /* Hook for plugins to get control in add_paths_to_joinrel() */
  set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
  
! /*
!  * Paths parameterized by the parent can be considered to be parameterized by
!  * any of its child.
!  */
! #define PATH_PARAM_BY_PARENT(path, rel)	\
! 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path),	\
! 									   (rel)->top_parent_relids))
! #define PATH_PARAM_BY_REL_SELF(path, rel)  \
  	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path), (rel)->relids))
  
+ #define PATH_PARAM_BY_REL(path, rel)	\
+ 	(PATH_PARAM_BY_REL_SELF(path, rel) || PATH_PARAM_BY_PARENT(path, rel))
+ 
  static void try_partial_mergejoin_path(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   Path *outer_path,
*************** static void try_partial_mergejoin_path(P
*** 38,66 ****
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra);
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
! 					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra);
  static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra);
  static void consider_parallel_nestloop(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra);
  static void consider_parallel_mergejoin(PlannerInfo *root,
  							RelOptInfo *joinrel,
  							RelOptInfo *outerrel,
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total);
  static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra);
  static List *select_mergejoin_clauses(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *outerrel,
--- 49,97 ----
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped,
! 						   bool do_aggregate);
  static void sort_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
! 								 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 								 JoinType jointype, JoinPathExtraData *extra,
! 								 bool grouped);
! static void sort_inner_and_outer_common(PlannerInfo *root,
! 										RelOptInfo *joinrel,
! 										RelOptInfo *outerrel,
! 										RelOptInfo *innerrel,
! 										JoinType jointype,
! 										JoinPathExtraData *extra,
! 										bool grouped_outer,
! 										bool grouped_inner,
! 										bool do_aggregate);
  static void match_unsorted_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra,
! 					 bool grouped);
  static void consider_parallel_nestloop(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped, bool do_aggregate);
  static void consider_parallel_mergejoin(PlannerInfo *root,
  							RelOptInfo *joinrel,
  							RelOptInfo *outerrel,
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total,
! 							bool grouped);
  static void hash_inner_and_outer(PlannerInfo *root, RelOptInfo *joinrel,
  					 RelOptInfo *outerrel, RelOptInfo *innerrel,
! 					 JoinType jointype, JoinPathExtraData *extra,
! 					 bool grouped);
! static bool is_grouped_join_target_complete(PlannerInfo *root,
! 											PathTarget *jointarget,
! 											Path *outer_path,
! 											Path *inner_path);
  static List *select_mergejoin_clauses(PlannerInfo *root,
  						 RelOptInfo *joinrel,
  						 RelOptInfo *outerrel,
*************** static void generate_mergejoin_paths(Pla
*** 77,83 ****
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial);
  
  
  /*
--- 108,117 ----
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial,
!  						 bool grouped_outer,
! 						 bool grouped_inner,
! 						 bool do_aggregate);
  
  
  /*
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 115,120 ****
--- 149,167 ----
  	JoinPathExtraData extra;
  	bool		mergejoin_allowed = true;
  	ListCell   *lc;
+ 	Relids		joinrelids;
+ 
+ 	/*
+ 	 * PlannerInfo doesn't contain the SpecialJoinInfos created for joins
+ 	 * between child relations, even if there is a SpecialJoinInfo node for
+ 	 * the join between the topmost parents. Hence while calculating Relids
+ 	 * set representing the restriction, consider relids of topmost parent
+ 	 * of partitions.
+ 	 */
+ 	if (joinrel->reloptkind == RELOPT_OTHER_JOINREL)
+ 		joinrelids = joinrel->top_parent_relids;
+ 	else
+ 		joinrelids = joinrel->relids;
  
  	extra.restrictlist = restrictlist;
  	extra.mergeclause_list = NIL;
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 197,212 ****
  		 * join has already been proven legal.)  If the SJ is relevant, it
  		 * presents constraints for joining to anything not in its RHS.
  		 */
! 		if (bms_overlap(joinrel->relids, sjinfo2->min_righthand) &&
! 			!bms_overlap(joinrel->relids, sjinfo2->min_lefthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													sjinfo2->min_righthand));
  
  		/* full joins constrain both sides symmetrically */
  		if (sjinfo2->jointype == JOIN_FULL &&
! 			bms_overlap(joinrel->relids, sjinfo2->min_lefthand) &&
! 			!bms_overlap(joinrel->relids, sjinfo2->min_righthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													 sjinfo2->min_lefthand));
--- 244,259 ----
  		 * join has already been proven legal.)  If the SJ is relevant, it
  		 * presents constraints for joining to anything not in its RHS.
  		 */
! 		if (bms_overlap(joinrelids, sjinfo2->min_righthand) &&
! 			!bms_overlap(joinrelids, sjinfo2->min_lefthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													sjinfo2->min_righthand));
  
  		/* full joins constrain both sides symmetrically */
  		if (sjinfo2->jointype == JOIN_FULL &&
! 			bms_overlap(joinrelids, sjinfo2->min_lefthand) &&
! 			!bms_overlap(joinrelids, sjinfo2->min_righthand))
  			extra.param_source_rels = bms_join(extra.param_source_rels,
  										   bms_difference(root->all_baserels,
  													 sjinfo2->min_lefthand));
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 227,234 ****
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
  	if (mergejoin_allowed)
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra);
  
  	/*
  	 * 2. Consider paths where the outer relation need not be explicitly
--- 274,285 ----
  	 * sorted.  Skip this if we can't mergejoin.
  	 */
  	if (mergejoin_allowed)
+ 	{
  		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, false);
! 		sort_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, true);
! 	}
  
  	/*
  	 * 2. Consider paths where the outer relation need not be explicitly
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 238,245 ****
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
  	if (mergejoin_allowed)
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra);
  
  #ifdef NOT_USED
  
--- 289,300 ----
  	 * joins at all, so it wouldn't work in the prohibited cases either.)
  	 */
  	if (mergejoin_allowed)
+ 	{
  		match_unsorted_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, false);
! 		match_unsorted_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, true);
! 	}
  
  #ifdef NOT_USED
  
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 265,272 ****
  	 * joins, because there may be no other alternative.
  	 */
  	if (enable_hashjoin || jointype == JOIN_FULL)
  		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra);
  
  	/*
  	 * 5. If inner and outer relations are foreign tables (or joins) belonging
--- 320,331 ----
  	 * joins, because there may be no other alternative.
  	 */
  	if (enable_hashjoin || jointype == JOIN_FULL)
+ 	{
  		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, false);
! 		hash_inner_and_outer(root, joinrel, outerrel, innerrel,
! 							 jointype, &extra, true);
! 	}
  
  	/*
  	 * 5. If inner and outer relations are foreign tables (or joins) belonging
*************** add_paths_to_joinrel(PlannerInfo *root,
*** 304,321 ****
   */
  static inline bool
  allow_star_schema_join(PlannerInfo *root,
! 					   Path *outer_path,
! 					   Path *inner_path)
  {
- 	Relids		innerparams = PATH_REQ_OUTER(inner_path);
- 	Relids		outerrelids = outer_path->parent->relids;
- 
  	/*
  	 * It's a star-schema case if the outer rel provides some but not all of
  	 * the inner rel's parameterization.
  	 */
! 	return (bms_overlap(innerparams, outerrelids) &&
! 			bms_nonempty_difference(innerparams, outerrelids));
  }
  
  /*
--- 363,377 ----
   */
  static inline bool
  allow_star_schema_join(PlannerInfo *root,
! 					   Relids outerrelids,
! 					   Relids inner_paramrels)
  {
  	/*
  	 * It's a star-schema case if the outer rel provides some but not all of
  	 * the inner rel's parameterization.
  	 */
! 	return (bms_overlap(inner_paramrels, outerrelids) &&
! 			bms_nonempty_difference(inner_paramrels, outerrelids));
  }
  
  /*
*************** try_nestloop_path(PlannerInfo *root,
*** 330,339 ****
  				  Path *inner_path,
  				  List *pathkeys,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
  
  	/*
  	 * Check to see if proposed path is still parameterized, and reject if the
--- 386,427 ----
  				  Path *inner_path,
  				  List *pathkeys,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra,
! 				  bool grouped,
! 				  bool do_aggregate)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
+ 	RelOptInfo *innerrel = inner_path->parent;
+ 	RelOptInfo *outerrel = outer_path->parent;
+ 	Relids		innerrelids;
+ 	Relids		outerrelids;
+ 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
+ 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
+  	Path		*join_path;
+  	PathTarget	*join_target;
+ 
+  	/* Caller should not request aggregation w/o grouped output. */
+ 	Assert(!do_aggregate || grouped);
+ 
+ 	/* GroupedPathInfo is necessary for us to produce a grouped set. */
+ 	Assert(joinrel->gpi != NULL || !grouped);
+ 
+ 	/*
+ 	 * Parameterized paths in the child relations (base or join) are
+ 	 * parameterized by top-level parent. Any paths we will create to be
+ 	 * parameterized by the child child relations, are not added to the
+ 	 * pathlist. Hence run parameterization tests on the parent relids.
+ 	 */
+ 	if (innerrel->top_parent_relids)
+ 		innerrelids = innerrel->top_parent_relids;
+ 	else
+ 		innerrelids = innerrel->relids;
+ 
+ 	if (outerrel->top_parent_relids)
+ 		outerrelids = outerrel->top_parent_relids;
+ 	else
+ 		outerrelids = outerrel->relids;
  
  	/*
  	 * Check to see if proposed path is still parameterized, and reject if the
*************** try_nestloop_path(PlannerInfo *root,
*** 341,359 ****
  	 * says to allow it anyway.  Also, we must reject if have_dangerous_phv
  	 * doesn't like the look of it, which could only happen if the nestloop is
  	 * still parameterized.
  	 */
! 	required_outer = calc_nestloop_required_outer(outer_path,
! 												  inner_path);
! 	if (required_outer &&
! 		((!bms_overlap(required_outer, extra->param_source_rels) &&
! 		  !allow_star_schema_join(root, outer_path, inner_path)) ||
! 		 have_dangerous_phv(root,
! 							outer_path->parent->relids,
! 							PATH_REQ_OUTER(inner_path))))
  	{
! 		/* Waste no memory when we reject a path here */
! 		bms_free(required_outer);
! 		return;
  	}
  
  	/*
--- 429,452 ----
  	 * says to allow it anyway.  Also, we must reject if have_dangerous_phv
  	 * doesn't like the look of it, which could only happen if the nestloop is
  	 * still parameterized.
+ 	 *
+ 	 * Grouped path should never be parameterized.
  	 */
! 	required_outer = calc_nestloop_required_outer(outerrelids, outer_paramrels,
! 												  innerrelids, inner_paramrels);
! 	if (required_outer)
  	{
! 		if (grouped ||
! 			(!bms_overlap(required_outer, extra->param_source_rels) &&
! 			 !allow_star_schema_join(root, outerrelids, inner_paramrels)) ||
! 			have_dangerous_phv(root,
! 							   outer_path->parent->relids,
! 							   PATH_REQ_OUTER(inner_path)))
! 		{
! 			/* Waste no memory when we reject a path here */
! 			bms_free(required_outer);
! 			return;
! 		}
  	}
  
  	/*
*************** try_nestloop_path(PlannerInfo *root,
*** 368,388 ****
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
  
! 	if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer))
  	{
! 		add_path(joinrel, (Path *)
! 				 create_nestloop_path(root,
! 									  joinrel,
! 									  jointype,
! 									  &workspace,
! 									  extra,
! 									  outer_path,
! 									  inner_path,
! 									  extra->restrictlist,
! 									  pathkeys,
! 									  required_outer));
  	}
  	else
  	{
--- 461,522 ----
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
  
!  	/*
!  	 * Determine which target the join should produce.
!  	 *
!  	 * In the case of explicit aggregation, output of the join itself is
!  	 * plain.
!  	 */
!  	if (!grouped || do_aggregate)
!  		join_target = joinrel->reltarget;
!  	else
!  		join_target = joinrel->gpi->target;
! 
!  	join_path = (Path *) create_nestloop_path(root, joinrel, jointype,
!  											  &workspace, extra,
!  											  outer_path, inner_path,
!  											  extra->restrictlist, pathkeys,
!  											  required_outer, join_target);
! 
!  	/* Do partial aggregation if needed. */
!  	if (do_aggregate && required_outer == NULL)
!  	{
!  		create_grouped_path(root, joinrel, join_path, true, false,
!  							AGG_HASHED);
!  		create_grouped_path(root, joinrel, join_path, true, false,
!  							AGG_SORTED);
!  	}
! 	else if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer, grouped))
  	{
! 		/*
! 		 * Since result produced by a child is part of the result produced by
! 		 * its topmost parent and has same properties, the parameters
! 		 * representing that parent may be substituted by values from a child.
! 		 * Hence expressions and hence paths using those expressions,
! 		 * parameterized by a parent can be said to be parameterized by any of
! 		 * its child.  For a join between child relations, if the inner path is
! 		 * parameterized by the parent of the outer relation,  create a
! 		 * nestloop join path with inner relation parameterized by the outer
! 		 * relation by translating the inner path to be parameterized by the
! 		 * outer child relation. The translated path should have the same costs
! 		 * as the original path, so cost check above should still hold.
! 		 */
! 		if (PATH_PARAM_BY_PARENT(inner_path, outer_path->parent))
! 		{
! 			inner_path = reparameterize_path_by_child(root, inner_path,
! 													   outer_path->parent);
! 
! 			/*
! 			 * If we could not translate the path, we can't create nest loop
! 			 * path.
! 			 */
! 			if (!inner_path)
! 				return;
! 		}
! 
! 		add_path(joinrel, join_path, grouped);
  	}
  	else
  	{
*************** try_partial_nestloop_path(PlannerInfo *r
*** 403,411 ****
  						  Path *inner_path,
  						  List *pathkeys,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra)
  {
  	JoinCostWorkspace workspace;
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
--- 537,553 ----
  						  Path *inner_path,
  						  List *pathkeys,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool grouped,
! 						  bool do_aggregate)
  {
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* The same checks we do in try_nestloop_path. */
+ 	Assert(!do_aggregate || grouped);
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
*************** try_partial_nestloop_path(PlannerInfo *r
*** 428,448 ****
  	 */
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
! 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
  		return;
  
! 	/* Might be good enough to be worth trying, so let's try it. */
! 	add_partial_path(joinrel, (Path *)
! 					 create_nestloop_path(root,
! 										  joinrel,
! 										  jointype,
! 										  &workspace,
! 										  extra,
! 										  outer_path,
! 										  inner_path,
! 										  extra->restrictlist,
! 										  pathkeys,
! 										  NULL));
  }
  
  /*
--- 570,650 ----
  	 */
  	initial_cost_nestloop(root, &workspace, jointype,
  						  outer_path, inner_path, extra);
! 
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 	{
! 		Assert(joinrel->gpi != NULL);
! 		join_target = joinrel->gpi->target;
! 	}
! 
! 	join_path = (Path *) create_nestloop_path(root, joinrel, jointype,
! 											  &workspace, extra,
! 											  outer_path, inner_path,
! 											  extra->restrictlist, pathkeys,
! 											  NULL, join_target);
! 
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_HASHED);
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_SORTED);
! 	}
! 	else if (add_partial_path_precheck(joinrel, workspace.total_cost,
! 									   pathkeys, grouped))
! 	{
! 		/* Might be good enough to be worth trying, so let's try it. */
! 		add_partial_path(joinrel, (Path *) join_path, grouped);
! 	}
! }
! 
! static void
! try_grouped_nestloop_path(PlannerInfo *root,
! 						  RelOptInfo *joinrel,
! 						  Path *outer_path,
! 						  Path *inner_path,
! 						  List *pathkeys,
! 						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool do_aggregate,
! 						  bool partial)
! {
! 	/*
! 	 * Missing GroupedPathInfo indicates that we should not try to create a
! 	 * grouped join.
! 	 */
! 	if (joinrel->gpi == NULL)
  		return;
  
! 	/*
! 	 * Reject the path if we're supposed to combine grouped and plain relation
! 	 * but the grouped one does not evaluate all the relevant aggregates.
! 	 */
! 	if (!do_aggregate &&
! 		!is_grouped_join_target_complete(root, joinrel->gpi->target,
! 										 outer_path, inner_path))
! 		return;
! 
! 	/*
! 	 * As repeated aggregation doesn't seem to be attractive, make sure that
! 	 * the resulting grouped relation is not parameterized.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return;
! 
! 	if (!partial)
! 		try_nestloop_path(root, joinrel, outer_path, inner_path, pathkeys,
! 						  jointype, extra, true, do_aggregate);
! 	else
! 		try_partial_nestloop_path(root, joinrel, outer_path, inner_path,
! 								  pathkeys, jointype, extra, true,
! 								  do_aggregate);
  }
  
  /*
*************** try_mergejoin_path(PlannerInfo *root,
*** 461,470 ****
  				   List *innersortkeys,
  				   JoinType jointype,
  				   JoinPathExtraData *extra,
! 				   bool is_partial)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
  
  	if (is_partial)
  	{
--- 663,682 ----
  				   List *innersortkeys,
  				   JoinType jointype,
  				   JoinPathExtraData *extra,
! 				   bool is_partial,
! 				   bool grouped,
! 				   bool do_aggregate)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* Caller should not request aggregation w/o grouped output. */
+ 	Assert(!do_aggregate || grouped);
+ 
+ 	/* GroupedPathInfo is necessary for us to produce a grouped set. */
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	if (is_partial)
  	{
*************** try_mergejoin_path(PlannerInfo *root,
*** 477,498 ****
  								   outersortkeys,
  								   innersortkeys,
  								   jointype,
! 								   extra);
  		return;
  	}
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if the
! 	 * parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path,
! 													  inner_path);
! 	if (required_outer &&
! 		!bms_overlap(required_outer, extra->param_source_rels))
  	{
! 		/* Waste no memory when we reject a path here */
! 		bms_free(required_outer);
! 		return;
  	}
  
  	/*
--- 689,713 ----
  								   outersortkeys,
  								   innersortkeys,
  								   jointype,
! 								   extra,
! 								   grouped,
! 								   do_aggregate);
  		return;
  	}
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if
! 	 * it's grouped or if the parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path, inner_path);
! 	if (required_outer)
  	{
! 		if (grouped || !bms_overlap(required_outer, extra->param_source_rels))
! 		{
! 			/* Waste no memory when we reject a path here */
! 			bms_free(required_outer);
! 			return;
! 		}
  	}
  
  	/*
*************** try_mergejoin_path(PlannerInfo *root,
*** 511,537 ****
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys,
! 						   extra);
  
! 	if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer))
  	{
! 		add_path(joinrel, (Path *)
! 				 create_mergejoin_path(root,
! 									   joinrel,
! 									   jointype,
! 									   &workspace,
! 									   extra,
! 									   outer_path,
! 									   inner_path,
! 									   extra->restrictlist,
! 									   pathkeys,
! 									   required_outer,
! 									   mergeclauses,
! 									   outersortkeys,
! 									   innersortkeys));
  	}
  	else
  	{
--- 726,773 ----
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys, extra);
  
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 		join_target = joinrel->gpi->target;
! 
! 
! 	join_path = (Path *) create_mergejoin_path(root,
! 											   joinrel,
! 											   jointype,
! 											   &workspace,
! 											   extra,
! 											   outer_path,
! 											   inner_path,
! 											   extra->restrictlist,
! 											   pathkeys,
! 											   required_outer,
! 											   mergeclauses,
! 											   outersortkeys,
! 											   innersortkeys,
! 											   join_target);
! 
! 	/* Do partial aggregation if needed. */
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, false,
! 								  AGG_HASHED);
! 		create_grouped_path(root, joinrel, join_path, true, false,
! 								  AGG_SORTED);
! 	}
! 	else if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  pathkeys, required_outer, grouped))
  	{
! 		add_path(joinrel, (Path *) join_path, grouped);
  	}
  	else
  	{
*************** try_partial_mergejoin_path(PlannerInfo *
*** 555,563 ****
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra)
  {
  	JoinCostWorkspace workspace;
  
  	/*
  	 * See comments in try_partial_hashjoin_path().
--- 791,807 ----
  						   List *outersortkeys,
  						   List *innersortkeys,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped,
! 						   bool do_aggregate)
  {
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* The same checks we do in try_mergejoin_path. */
+ 	Assert(!do_aggregate || grouped);
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
  	 * See comments in try_partial_hashjoin_path().
*************** try_partial_mergejoin_path(PlannerInfo *
*** 587,613 ****
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys,
! 						   extra);
  
! 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, pathkeys))
  		return;
  
! 	/* Might be good enough to be worth trying, so let's try it. */
! 	add_partial_path(joinrel, (Path *)
! 					 create_mergejoin_path(root,
! 										   joinrel,
! 										   jointype,
! 										   &workspace,
! 										   extra,
! 										   outer_path,
! 										   inner_path,
! 										   extra->restrictlist,
! 										   pathkeys,
! 										   NULL,
! 										   mergeclauses,
! 										   outersortkeys,
! 										   innersortkeys));
  }
  
  /*
--- 831,1003 ----
  	 */
  	initial_cost_mergejoin(root, &workspace, jointype, mergeclauses,
  						   outer_path, inner_path,
! 						   outersortkeys, innersortkeys, extra);
  
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 	{
! 		Assert(joinrel->gpi != NULL);
! 		join_target = joinrel->gpi->target;
! 	}
! 
! 	join_path = (Path *) create_mergejoin_path(root,
! 											   joinrel,
! 											   jointype,
! 											   &workspace,
! 											   extra,
! 											   outer_path,
! 											   inner_path,
! 											   extra->restrictlist,
! 											   pathkeys,
! 											   NULL,
! 											   mergeclauses,
! 											   outersortkeys,
! 											   innersortkeys,
! 											   join_target);
! 
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_HASHED);
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_SORTED);
! 	}
! 	else if (add_partial_path_precheck(joinrel, workspace.total_cost,
! 									   pathkeys, grouped))
! 	{
! 		/* Might be good enough to be worth trying, so let's try it. */
! 		add_partial_path(joinrel, (Path *) join_path, grouped);
! 	}
! }
! 
! static void
! try_grouped_mergejoin_path(PlannerInfo *root,
! 						   RelOptInfo *joinrel,
! 						   Path *outer_path,
! 						   Path *inner_path,
! 						   List *pathkeys,
! 						   List *mergeclauses,
! 						   List *outersortkeys,
! 						   List *innersortkeys,
! 						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool partial,
! 						   bool do_aggregate)
! {
! 	/*
! 	 * Missing GroupedPathInfo indicates that we should not try to create a
! 	 * grouped join.
! 	 */
! 	if (joinrel->gpi == NULL)
  		return;
  
! 	/*
! 	 * Reject the path if we're supposed to combine grouped and plain relation
! 	 * but the grouped one does not evaluate all the relevant aggregates.
! 	 */
! 	if (!do_aggregate &&
! 		!is_grouped_join_target_complete(root, joinrel->gpi->target,
! 										 outer_path, inner_path))
! 		return;
! 
! 	/*
! 	 * As repeated aggregation doesn't seem to be attractive, make sure that
! 	 * the resulting grouped relation is not parameterized.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return;
! 
! 	if (!partial)
! 		try_mergejoin_path(root, joinrel, outer_path, inner_path, pathkeys,
! 						   mergeclauses, outersortkeys, innersortkeys,
! 						   jointype, extra, false, true, do_aggregate);
! 	else
! 		try_partial_mergejoin_path(root, joinrel, outer_path, inner_path,
! 								   pathkeys,
! 								   mergeclauses, outersortkeys, innersortkeys,
! 								   jointype, extra, true, do_aggregate);
! }
! 
! static void
! try_mergejoin_path_common(PlannerInfo *root,
! 						  RelOptInfo *joinrel,
! 						  Path *outer_path,
! 						  Path *inner_path,
! 						  List *pathkeys,
! 						  List *mergeclauses,
! 						  List *outersortkeys,
! 						  List *innersortkeys,
! 						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool partial,
! 						  bool grouped_outer,
! 						  bool grouped_inner,
! 						  bool do_aggregate)
! {
! 	bool		grouped_join;
! 
! 	grouped_join = grouped_outer || grouped_inner || do_aggregate;
! 
! 	/* Join of two grouped paths is not supported. */
! 	Assert(!(grouped_outer && grouped_inner));
! 
! 	if (!grouped_join)
! 	{
! 		/* Only join plain paths. */
! 		try_mergejoin_path(root,
! 						   joinrel,
! 						   outer_path,
! 						   inner_path,
! 						   pathkeys,
! 						   mergeclauses,
! 						   outersortkeys,
! 						   innersortkeys,
! 						   jointype,
! 						   extra,
! 						   partial,
! 						   false, false);
! 	}
! 	else if (grouped_outer || grouped_inner)
! 	{
! 		Assert(!do_aggregate);
! 
! 		/*
! 		 * Exactly one of the input paths is grouped, so create a grouped join
! 		 * path.
! 		 */
! 		try_grouped_mergejoin_path(root,
! 								   joinrel,
! 								   outer_path,
! 								   inner_path,
! 								   pathkeys,
! 								   mergeclauses,
! 								   outersortkeys,
! 								   innersortkeys,
! 								   jointype,
! 								   extra,
! 								   partial,
! 								   false);
! 	}
! 	/* Preform explicit aggregation only if suitable target exists. */
! 	else if (joinrel->gpi != NULL)
! 	{
! 		try_grouped_mergejoin_path(root,
! 								   joinrel,
! 								   outer_path,
! 								   inner_path,
! 								   pathkeys,
! 								   mergeclauses,
! 								   outersortkeys,
! 								   innersortkeys,
! 								   jointype,
! 								   extra,
! 								   partial, true);
! 	}
  }
  
  /*
*************** try_hashjoin_path(PlannerInfo *root,
*** 622,668 ****
  				  Path *inner_path,
  				  List *hashclauses,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if the
! 	 * parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path,
! 													  inner_path);
! 	if (required_outer &&
! 		!bms_overlap(required_outer, extra->param_source_rels))
  	{
! 		/* Waste no memory when we reject a path here */
! 		bms_free(required_outer);
! 		return;
  	}
  
  	/*
  	 * See comments in try_nestloop_path().  Also note that hashjoin paths
  	 * never have any output pathkeys, per comments in create_hashjoin_path.
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
  
! 	if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  NIL, required_outer))
  	{
! 		add_path(joinrel, (Path *)
! 				 create_hashjoin_path(root,
! 									  joinrel,
! 									  jointype,
! 									  &workspace,
! 									  extra,
! 									  outer_path,
! 									  inner_path,
! 									  extra->restrictlist,
! 									  required_outer,
! 									  hashclauses));
  	}
  	else
  	{
--- 1012,1086 ----
  				  Path *inner_path,
  				  List *hashclauses,
  				  JoinType jointype,
! 				  JoinPathExtraData *extra,
! 				  bool grouped,
! 				  bool do_aggregate)
  {
  	Relids		required_outer;
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* Caller should not request aggregation w/o grouped output. */
+ 	Assert(!do_aggregate || grouped);
+ 
+ 	/* GroupedPathInfo is necessary for us to produce a grouped set. */
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
! 	 * Check to see if proposed path is still parameterized, and reject if
! 	 * it's grouped or if the parameterization wouldn't be sensible.
  	 */
! 	required_outer = calc_non_nestloop_required_outer(outer_path, inner_path);
! 	if (required_outer)
  	{
! 		if (grouped || !bms_overlap(required_outer, extra->param_source_rels))
! 		{
! 			/* Waste no memory when we reject a path here */
! 			bms_free(required_outer);
! 			return;
! 		}
  	}
  
  	/*
  	 * See comments in try_nestloop_path().  Also note that hashjoin paths
  	 * never have any output pathkeys, per comments in create_hashjoin_path.
+ 	 *
+ 	 * TODO Need to consider aggregation here?
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
  
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 		join_target = joinrel->gpi->target;
! 
! 	join_path = (Path *) create_hashjoin_path(root, joinrel, jointype,
! 											  &workspace,
! 											  extra,
! 											  outer_path, inner_path,
! 											  extra->restrictlist,
! 											  required_outer, hashclauses,
! 											  join_target);
! 
! 	/* Do partial aggregation if needed. */
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, false,
! 								  AGG_HASHED);
! 	}
! 	else if (add_path_precheck(joinrel,
  						  workspace.startup_cost, workspace.total_cost,
! 						  NIL, required_outer, grouped))
  	{
! 		add_path(joinrel, (Path *) join_path, grouped);
  	}
  	else
  	{
*************** try_partial_hashjoin_path(PlannerInfo *r
*** 683,691 ****
  						  Path *inner_path,
  						  List *hashclauses,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra)
  {
  	JoinCostWorkspace workspace;
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
--- 1101,1117 ----
  						  Path *inner_path,
  						  List *hashclauses,
  						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool grouped,
! 						  bool do_aggregate)
  {
  	JoinCostWorkspace workspace;
+ 	Path		*join_path;
+ 	PathTarget	*join_target;
+ 
+ 	/* The same checks we do in try_hashjoin_path. */
+ 	Assert(!do_aggregate || grouped);
+ 	Assert(joinrel->gpi != NULL || !grouped);
  
  	/*
  	 * If the inner path is parameterized, the parameterization must be fully
*************** try_partial_hashjoin_path(PlannerInfo *r
*** 708,728 ****
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
! 	if (!add_partial_path_precheck(joinrel, workspace.total_cost, NIL))
  		return;
  
! 	/* Might be good enough to be worth trying, so let's try it. */
! 	add_partial_path(joinrel, (Path *)
! 					 create_hashjoin_path(root,
! 										  joinrel,
! 										  jointype,
! 										  &workspace,
! 										  extra,
! 										  outer_path,
! 										  inner_path,
! 										  extra->restrictlist,
! 										  NULL,
! 										  hashclauses));
  }
  
  /*
--- 1134,1229 ----
  	 */
  	initial_cost_hashjoin(root, &workspace, jointype, hashclauses,
  						  outer_path, inner_path, extra);
! 
! 	/*
! 	 * Determine which target the join should produce.
! 	 *
! 	 * In the case of explicit aggregation, output of the join itself is
! 	 * plain.
! 	 */
! 	if (!grouped || do_aggregate)
! 		join_target = joinrel->reltarget;
! 	else
! 	{
! 		Assert(joinrel->gpi != NULL);
! 		join_target = joinrel->gpi->target;
! 	}
! 
! 	join_path = (Path *) create_hashjoin_path(root, joinrel, jointype,
! 											  &workspace,
! 											  extra,
! 											  outer_path, inner_path,
! 											  extra->restrictlist, NULL,
! 											  hashclauses, join_target);
! 
! 	/* Do partial aggregation if needed. */
! 	if (do_aggregate)
! 	{
! 		create_grouped_path(root, joinrel, join_path, true, true, AGG_HASHED);
! 	}
! 	else if (add_partial_path_precheck(joinrel, workspace.total_cost,
! 									   NIL, grouped))
! 	{
! 		add_partial_path(joinrel, (Path *) join_path , grouped);
! 	}
! }
! 
! /*
!  * Create a new grouped hash join path by joining a grouped path to plain
!  * (non-grouped) one, or by joining 2 plain relations and applying grouping on
!  * the result.
!  *
!  * Joining of 2 grouped paths is not supported. If a grouped relation A was
!  * joined to grouped relation B, then the grouping of B reduces the number of
!  * times each group of A is appears in the join output. This makes difference
!  * for some aggregates, e.g. sum().
!  *
!  * If do_aggregate is true, neither input rel is grouped so we need to
!  * aggregate the join result explicitly.
!  *
!  * partial argument tells whether the join path should be considered partial.
!  */
! static void
! try_grouped_hashjoin_path(PlannerInfo *root,
! 						  RelOptInfo *joinrel,
! 						  Path *outer_path,
! 						  Path *inner_path,
! 						  List *hashclauses,
! 						  JoinType jointype,
! 						  JoinPathExtraData *extra,
! 						  bool do_aggregate,
! 						  bool partial)
! {
! 	/*
! 	 * Missing GroupedPathInfo indicates that we should not try to create a
! 	 * grouped join.
! 	 */
! 	if (joinrel->gpi == NULL)
  		return;
  
! 	/*
! 	 * Reject the path if we're supposed to combine grouped and plain relation
! 	 * but the grouped one does not evaluate all the relevant aggregates.
! 	 */
! 	if (!do_aggregate &&
! 		!is_grouped_join_target_complete(root, joinrel->gpi->target,
! 										 outer_path, inner_path))
! 		return;
! 
! 	/*
! 	 * As repeated aggregation doesn't seem to be attractive, make sure that
! 	 * the resulting grouped relation is not parameterized.
! 	 */
! 	if (outer_path->param_info != NULL || inner_path->param_info != NULL)
! 		return;
! 
! 	if (!partial)
! 		try_hashjoin_path(root, joinrel, outer_path, inner_path, hashclauses,
! 						  jointype, extra, true, do_aggregate);
! 	else
! 		try_partial_hashjoin_path(root, joinrel, outer_path, inner_path,
! 								  hashclauses, jointype, extra, true,
! 								  do_aggregate);
  }
  
  /*
*************** sort_inner_and_outer(PlannerInfo *root,
*** 773,779 ****
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	Path	   *outer_path;
--- 1274,1313 ----
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra,
! 					 bool grouped)
! {
! 	if (!grouped)
! 	{
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, false, false, false);
! 	}
! 	else
! 	{
! 		/* Use all the supported strategies to generate grouped join. */
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, true, false, false);
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, false, true, false);
! 		sort_inner_and_outer_common(root, joinrel, outerrel, innerrel,
! 									jointype, extra, false, false, true);
! 	}
! }
! 
! /*
!  * TODO As merge_pathkeys shouldn't differ across execution, use a separate
!  * function to derive them and pass them here in a list.
!  */
! static void
! sort_inner_and_outer_common(PlannerInfo *root,
! 							RelOptInfo *joinrel,
! 							RelOptInfo *outerrel,
! 							RelOptInfo *innerrel,
! 							JoinType jointype,
! 							JoinPathExtraData *extra,
! 							bool grouped_outer,
! 							bool grouped_inner,
! 							bool do_aggregate)
  {
  	JoinType	save_jointype = jointype;
  	Path	   *outer_path;
*************** sort_inner_and_outer(PlannerInfo *root,
*** 782,787 ****
--- 1316,1322 ----
  	Path	   *cheapest_safe_inner = NULL;
  	List	   *all_pathkeys;
  	ListCell   *l;
+ 	bool	grouped_result;
  
  	/*
  	 * We only consider the cheapest-total-cost input paths, since we are
*************** sort_inner_and_outer(PlannerInfo *root,
*** 796,803 ****
  	 * against mergejoins with parameterized inputs; see comments in
  	 * src/backend/optimizer/README.
  	 */
! 	outer_path = outerrel->cheapest_total_path;
! 	inner_path = innerrel->cheapest_total_path;
  
  	/*
  	 * If either cheapest-total path is parameterized by the other rel, we
--- 1331,1357 ----
  	 * against mergejoins with parameterized inputs; see comments in
  	 * src/backend/optimizer/README.
  	 */
! 	if (grouped_outer)
! 	{
! 		if (outerrel->gpi != NULL && outerrel->gpi->pathlist != NIL)
! 			outer_path = linitial(outerrel->gpi->pathlist);
! 		else
! 			return;
! 	}
! 	else
! 		outer_path = outerrel->cheapest_total_path;
! 
! 	if (grouped_inner)
! 	{
! 		if (innerrel->gpi != NULL && innerrel->gpi->pathlist != NIL)
! 			inner_path = linitial(innerrel->gpi->pathlist);
! 		else
! 			return;
! 	}
! 	else
! 		inner_path = innerrel->cheapest_total_path;
! 
! 	grouped_result = grouped_outer || grouped_inner || do_aggregate;
  
  	/*
  	 * If either cheapest-total path is parameterized by the other rel, we
*************** sort_inner_and_outer(PlannerInfo *root,
*** 843,855 ****
  		outerrel->partial_pathlist != NIL &&
  		bms_is_empty(joinrel->lateral_relids))
  	{
! 		cheapest_partial_outer = (Path *) linitial(outerrel->partial_pathlist);
  
  		if (inner_path->parallel_safe)
  			cheapest_safe_inner = inner_path;
  		else if (save_jointype != JOIN_UNIQUE_INNER)
  			cheapest_safe_inner =
! 				get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
  	}
  
  	/*
--- 1397,1446 ----
  		outerrel->partial_pathlist != NIL &&
  		bms_is_empty(joinrel->lateral_relids))
  	{
! 		if (grouped_outer)
! 		{
! 			if (outerrel->gpi != NULL && outerrel->gpi->partial_pathlist != NIL)
! 				cheapest_partial_outer = (Path *)
! 					linitial(outerrel->gpi->partial_pathlist);
! 			else
! 				return;
! 		}
! 		else
! 			cheapest_partial_outer = (Path *)
! 				linitial(outerrel->partial_pathlist);
! 
! 		if (grouped_inner)
! 		{
! 			if (innerrel->gpi != NULL && innerrel->gpi->pathlist != NIL)
! 				inner_path = linitial(innerrel->gpi->pathlist);
! 			else
! 				return;
! 		}
! 		else
! 			inner_path = innerrel->cheapest_total_path;
  
  		if (inner_path->parallel_safe)
  			cheapest_safe_inner = inner_path;
  		else if (save_jointype != JOIN_UNIQUE_INNER)
+ 		{
+ 			List	*inner_pathlist;
+ 
+ 			if (!grouped_inner)
+ 				inner_pathlist = innerrel->pathlist;
+ 			else
+ 			{
+ 				Assert(innerrel->gpi != NULL);
+ 				inner_pathlist = innerrel->gpi->pathlist;
+ 			}
+ 
+ 			/*
+ 			 * All the grouped paths should be unparameterized, so the
+ 			 * function is overly stringent in the grouped_inner case, but
+ 			 * still useful.
+ 			 */
  			cheapest_safe_inner =
! 				get_cheapest_parallel_safe_total_inner(inner_pathlist);
! 		}
  	}
  
  	/*
*************** sort_inner_and_outer(PlannerInfo *root,
*** 925,957 ****
  		 * properly.  try_mergejoin_path will detect that case and suppress an
  		 * explicit sort step, so we needn't do so here.
  		 */
! 		try_mergejoin_path(root,
! 						   joinrel,
! 						   outer_path,
! 						   inner_path,
! 						   merge_pathkeys,
! 						   cur_mergeclauses,
! 						   outerkeys,
! 						   innerkeys,
! 						   jointype,
! 						   extra,
! 						   false);
  
  		/*
  		 * If we have partial outer and parallel safe inner path then try
  		 * partial mergejoin path.
  		 */
  		if (cheapest_partial_outer && cheapest_safe_inner)
! 			try_partial_mergejoin_path(root,
! 									   joinrel,
! 									   cheapest_partial_outer,
! 									   cheapest_safe_inner,
! 									   merge_pathkeys,
! 									   cur_mergeclauses,
! 									   outerkeys,
! 									   innerkeys,
! 									   jointype,
! 									   extra);
  	}
  }
  
--- 1516,1574 ----
  		 * properly.  try_mergejoin_path will detect that case and suppress an
  		 * explicit sort step, so we needn't do so here.
  		 */
! 		if (!grouped_result)
! 			try_mergejoin_path(root,
! 							   joinrel,
! 							   outer_path,
! 							   inner_path,
! 							   merge_pathkeys,
! 							   cur_mergeclauses,
! 							   outerkeys,
! 							   innerkeys,
! 							   jointype,
! 							   extra,
! 							   false, false, false);
! 		else
! 		{
! 			try_mergejoin_path_common(root, joinrel, outer_path, inner_path,
! 									  merge_pathkeys, cur_mergeclauses,
! 									  outerkeys, innerkeys, jointype, extra,
! 									  false,
! 									  grouped_outer, grouped_inner,
! 									  do_aggregate);
! 		}
  
  		/*
  		 * If we have partial outer and parallel safe inner path then try
  		 * partial mergejoin path.
  		 */
  		if (cheapest_partial_outer && cheapest_safe_inner)
! 		{
! 			if (!grouped_result)
! 			{
! 				try_partial_mergejoin_path(root,
! 										   joinrel,
! 										   cheapest_partial_outer,
! 										   cheapest_safe_inner,
! 										   merge_pathkeys,
! 										   cur_mergeclauses,
! 										   outerkeys,
! 										   innerkeys,
! 										   jointype,
! 										   extra, false, false);
! 			}
! 			else
! 			{
! 				try_mergejoin_path_common(root, joinrel,
! 										  cheapest_partial_outer,
! 										  cheapest_safe_inner,
! 										  merge_pathkeys, cur_mergeclauses,
! 										  outerkeys, innerkeys, jointype, extra,
! 										  true,
! 										  grouped_outer, grouped_inner,
! 										  do_aggregate);
! 			}
! 		}
  	}
  }
  
*************** sort_inner_and_outer(PlannerInfo *root,
*** 968,973 ****
--- 1585,1598 ----
   * some sort key requirements).  So, we consider truncations of the
   * mergeclause list as well as the full list.  (Ideally we'd consider all
   * subsets of the mergeclause list, but that seems way too expensive.)
+  *
+  * grouped_outer - is outerpath grouped?
+  * grouped_inner - use grouped paths of innerrel?
+  * do_aggregate - apply (partial) aggregation to the output?
+  *
+  * TODO If subsequent calls often differ only by the 3 arguments above,
+  * consider a workspace structure to share useful info (eg merge clauses)
+  * across calls.
   */
  static void
  generate_mergejoin_paths(PlannerInfo *root,
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 979,985 ****
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial)
  {
  	List	   *mergeclauses;
  	List	   *innersortkeys;
--- 1604,1613 ----
  						 bool useallclauses,
  						 Path *inner_cheapest_total,
  						 List *merge_pathkeys,
! 						 bool is_partial,
! 						 bool grouped_outer,
! 						 bool grouped_inner,
! 						 bool do_aggregate)
  {
  	List	   *mergeclauses;
  	List	   *innersortkeys;
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1030,1046 ****
  	 * try_mergejoin_path will do the right thing if inner_cheapest_total is
  	 * already correctly sorted.)
  	 */
! 	try_mergejoin_path(root,
! 					   joinrel,
! 					   outerpath,
! 					   inner_cheapest_total,
! 					   merge_pathkeys,
! 					   mergeclauses,
! 					   NIL,
! 					   innersortkeys,
! 					   jointype,
! 					   extra,
! 					   is_partial);
  
  	/* Can't do anything else if inner path needs to be unique'd */
  	if (save_jointype == JOIN_UNIQUE_INNER)
--- 1658,1675 ----
  	 * try_mergejoin_path will do the right thing if inner_cheapest_total is
  	 * already correctly sorted.)
  	 */
! 	try_mergejoin_path_common(root,
! 							  joinrel,
! 							  outerpath,
! 							  inner_cheapest_total,
! 							  merge_pathkeys,
! 							  mergeclauses,
! 							  NIL,
! 							  innersortkeys,
! 							  jointype,
! 							  extra,
! 							  is_partial,
! 							  grouped_outer, grouped_inner, do_aggregate);
  
  	/* Can't do anything else if inner path needs to be unique'd */
  	if (save_jointype == JOIN_UNIQUE_INNER)
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1096,1111 ****
  
  	for (sortkeycnt = num_sortkeys; sortkeycnt > 0; sortkeycnt--)
  	{
  		Path	   *innerpath;
  		List	   *newclauses = NIL;
  
  		/*
  		 * Look for an inner path ordered well enough for the first
  		 * 'sortkeycnt' innersortkeys.  NB: trialsortkeys list is modified
  		 * destructively, which is why we made a copy...
  		 */
  		trialsortkeys = list_truncate(trialsortkeys, sortkeycnt);
! 		innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
  												   trialsortkeys,
  												   NULL,
  												   TOTAL_COST,
--- 1725,1746 ----
  
  	for (sortkeycnt = num_sortkeys; sortkeycnt > 0; sortkeycnt--)
  	{
+ 		List		*inner_pathlist = NIL;
  		Path	   *innerpath;
  		List	   *newclauses = NIL;
  
+ 		if (!grouped_inner)
+ 			inner_pathlist = innerrel->pathlist;
+ 		else if (innerrel->gpi != NULL)
+ 			inner_pathlist = innerrel->gpi->pathlist;
+ 
  		/*
  		 * Look for an inner path ordered well enough for the first
  		 * 'sortkeycnt' innersortkeys.  NB: trialsortkeys list is modified
  		 * destructively, which is why we made a copy...
  		 */
  		trialsortkeys = list_truncate(trialsortkeys, sortkeycnt);
! 		innerpath = get_cheapest_path_for_pathkeys(inner_pathlist,
  												   trialsortkeys,
  												   NULL,
  												   TOTAL_COST,
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1128,1148 ****
  			}
  			else
  				newclauses = mergeclauses;
! 			try_mergejoin_path(root,
! 							   joinrel,
! 							   outerpath,
! 							   innerpath,
! 							   merge_pathkeys,
! 							   newclauses,
! 							   NIL,
! 							   NIL,
! 							   jointype,
! 							   extra,
! 							   is_partial);
  			cheapest_total_inner = innerpath;
  		}
  		/* Same on the basis of cheapest startup cost ... */
! 		innerpath = get_cheapest_path_for_pathkeys(innerrel->pathlist,
  												   trialsortkeys,
  												   NULL,
  												   STARTUP_COST,
--- 1763,1787 ----
  			}
  			else
  				newclauses = mergeclauses;
! 
! 			try_mergejoin_path_common(root,
! 									  joinrel,
! 									  outerpath,
! 									  innerpath,
! 									  merge_pathkeys,
! 									  newclauses,
! 									  NIL,
! 									  NIL,
! 									  jointype,
! 									  extra,
! 									  is_partial,
! 									  grouped_outer, grouped_inner,
! 									  do_aggregate);
! 
  			cheapest_total_inner = innerpath;
  		}
  		/* Same on the basis of cheapest startup cost ... */
! 		innerpath = get_cheapest_path_for_pathkeys(inner_pathlist,
  												   trialsortkeys,
  												   NULL,
  												   STARTUP_COST,
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1173,1189 ****
  					else
  						newclauses = mergeclauses;
  				}
! 				try_mergejoin_path(root,
! 								   joinrel,
! 								   outerpath,
! 								   innerpath,
! 								   merge_pathkeys,
! 								   newclauses,
! 								   NIL,
! 								   NIL,
! 								   jointype,
! 								   extra,
! 								   is_partial);
  			}
  			cheapest_startup_inner = innerpath;
  		}
--- 1812,1830 ----
  					else
  						newclauses = mergeclauses;
  				}
! 				try_mergejoin_path_common(root,
! 										  joinrel,
! 										  outerpath,
! 										  innerpath,
! 										  merge_pathkeys,
! 										  newclauses,
! 										  NIL,
! 										  NIL,
! 										  jointype,
! 										  extra,
! 										  is_partial,
! 										  grouped_outer, grouped_inner,
! 										  do_aggregate);
  			}
  			cheapest_startup_inner = innerpath;
  		}
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1218,1223 ****
--- 1859,1866 ----
   * 'innerrel' is the inner join relation
   * 'jointype' is the type of join to do
   * 'extra' contains additional input values
+  * 'grouped' indicates that the at least one relation in the join has been
+  * aggregated.
   */
  static void
  match_unsorted_outer(PlannerInfo *root,
*************** match_unsorted_outer(PlannerInfo *root,
*** 1225,1231 ****
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	bool		nestjoinOK;
--- 1868,1875 ----
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra,
! 					 bool grouped)
  {
  	JoinType	save_jointype = jointype;
  	bool		nestjoinOK;
*************** match_unsorted_outer(PlannerInfo *root,
*** 1235,1240 ****
--- 1879,1906 ----
  	ListCell   *lc1;
  
  	/*
+ 	 * If grouped join path is requested, we ignore cases where either input
+ 	 * path needs to be unique. For each side we should expect either grouped
+ 	 * or plain relation, which differ quite a bit.
+ 	 *
+ 	 * XXX Although unique-ification of grouped path might result in too
+ 	 * expensive input path (note that grouped input relation is not
+ 	 * necessarily unique, regardless the grouping keys --- one or more plain
+ 	 * relation could already have been joined to it), we might want to
+ 	 * unique-ify the input relation in the future at least in the case it's a
+ 	 * plain relation.
+ 	 *
+ 	 * (Materialization is not involved in grouped paths for similar reasons.)
+ 	 */
+ 	if (grouped &&
+ 		(jointype == JOIN_UNIQUE_OUTER || jointype == JOIN_UNIQUE_INNER))
+ 		return;
+ 
+ 	/* No grouped join w/o grouped target. */
+ 	if (grouped && joinrel->gpi == NULL)
+ 		return;
+ 
+ 	/*
  	 * Nestloop only supports inner, left, semi, and anti joins.  Also, if we
  	 * are doing a right or full mergejoin, we must use *all* the mergeclauses
  	 * as join clauses, else we will not have a valid plan.  (Although these
*************** match_unsorted_outer(PlannerInfo *root,
*** 1290,1296 ****
  			create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
  		Assert(inner_cheapest_total);
  	}
! 	else if (nestjoinOK)
  	{
  		/*
  		 * Consider materializing the cheapest inner path, unless
--- 1956,1962 ----
  			create_unique_path(root, innerrel, inner_cheapest_total, extra->sjinfo);
  		Assert(inner_cheapest_total);
  	}
! 	else if (nestjoinOK && !grouped)
  	{
  		/*
  		 * Consider materializing the cheapest inner path, unless
*************** match_unsorted_outer(PlannerInfo *root,
*** 1321,1326 ****
--- 1987,1994 ----
  		 */
  		if (save_jointype == JOIN_UNIQUE_OUTER)
  		{
+ 			Assert(!grouped);
+ 
  			if (outerpath != outerrel->cheapest_total_path)
  				continue;
  			outerpath = (Path *) create_unique_path(root, outerrel,
*************** match_unsorted_outer(PlannerInfo *root,
*** 1348,1354 ****
  							  inner_cheapest_total,
  							  merge_pathkeys,
  							  jointype,
! 							  extra);
  		}
  		else if (nestjoinOK)
  		{
--- 2016,2023 ----
  							  inner_cheapest_total,
  							  merge_pathkeys,
  							  jointype,
! 							  extra,
! 							  false, false);
  		}
  		else if (nestjoinOK)
  		{
*************** match_unsorted_outer(PlannerInfo *root,
*** 1364,1387 ****
  			{
  				Path	   *innerpath = (Path *) lfirst(lc2);
  
! 				try_nestloop_path(root,
! 								  joinrel,
! 								  outerpath,
! 								  innerpath,
! 								  merge_pathkeys,
! 								  jointype,
! 								  extra);
  			}
  
! 			/* Also consider materialized form of the cheapest inner path */
! 			if (matpath != NULL)
  				try_nestloop_path(root,
  								  joinrel,
  								  outerpath,
  								  matpath,
  								  merge_pathkeys,
  								  jointype,
! 								  extra);
  		}
  
  		/* Can't do anything else if outer path needs to be unique'd */
--- 2033,2078 ----
  			{
  				Path	   *innerpath = (Path *) lfirst(lc2);
  
! 				if (!grouped)
! 					try_nestloop_path(root,
! 									  joinrel,
! 									  outerpath,
! 									  innerpath,
! 									  merge_pathkeys,
! 									  jointype,
! 									  extra, false, false);
! 				else
! 				{
! 					/*
! 					 * Since both input paths are plain, request explicit
! 					 * aggregation.
! 					 */
! 					try_grouped_nestloop_path(root,
! 											  joinrel,
! 											  outerpath,
! 											  innerpath,
! 											  merge_pathkeys,
! 											  jointype,
! 											  extra,
! 											  true,
! 											  false);
! 				}
  			}
  
! 			/*
! 			 * Also consider materialized form of the cheapest inner path.
! 			 *
! 			 * (There's no matpath for grouped join.)
! 			 */
! 			if (matpath != NULL && !grouped)
  				try_nestloop_path(root,
  								  joinrel,
  								  outerpath,
  								  matpath,
  								  merge_pathkeys,
  								  jointype,
! 								  extra,
! 								  false, false);
  		}
  
  		/* Can't do anything else if outer path needs to be unique'd */
*************** match_unsorted_outer(PlannerInfo *root,
*** 1396,1402 ****
  		generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
  								 save_jointype, extra, useallclauses,
  								 inner_cheapest_total, merge_pathkeys,
! 								 false);
  	}
  
  	/*
--- 2087,2163 ----
  		generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
  								 save_jointype, extra, useallclauses,
  								 inner_cheapest_total, merge_pathkeys,
! 								 false, false, false, grouped);
! 
! 		/* Try to join the plain outer relation to grouped inner. */
! 		if (grouped && nestjoinOK &&
! 			save_jointype != JOIN_UNIQUE_OUTER &&
! 			save_jointype != JOIN_UNIQUE_INNER &&
! 			innerrel->gpi != NULL && outerrel->gpi == NULL)
! 		{
! 			Path	*inner_cheapest_grouped = (Path *) linitial(innerrel->gpi->pathlist);
! 
! 			if (PATH_PARAM_BY_REL(inner_cheapest_grouped, outerrel))
! 				continue;
! 
! 			/* XXX Shouldn't Assert() be used here instead? */
! 			if (PATH_PARAM_BY_REL(outerpath, innerrel))
! 				continue;
! 
! 			/*
! 			 * Only outer grouped path is interesting in this case: grouped
! 			 * path on the inner side of NL join would imply repeated
! 			 * aggregation somewhere in the inner path.
! 			 */
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 save_jointype, extra, useallclauses,
! 									 inner_cheapest_grouped, merge_pathkeys,
! 									 false, false, true, false);
! 		}
! 	}
! 
! 	/*
! 	 * Combine grouped outer and plain inner paths.
! 	 */
! 	if (grouped && nestjoinOK &&
! 		save_jointype != JOIN_UNIQUE_OUTER &&
! 		save_jointype != JOIN_UNIQUE_INNER)
! 	{
! 		/*
! 		 * If the inner rel had a grouped target, its plain paths should be
! 		 * ignored. Otherwise we could create grouped paths with different
! 		 * targets.
! 		 */
! 		if (outerrel->gpi != NULL && innerrel->gpi == NULL &&
! 			inner_cheapest_total != NULL)
! 		{
! 			/* Nested loop paths. */
! 			foreach(lc1, outerrel->gpi->pathlist)
! 			{
! 				Path	   *outerpath = (Path *) lfirst(lc1);
! 				List	*merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
! 															  outerpath->pathkeys);
! 
! 				if (PATH_PARAM_BY_REL(outerpath, innerrel))
! 					continue;
! 
! 				try_grouped_nestloop_path(root,
! 										  joinrel,
! 										  outerpath,
! 										  inner_cheapest_total,
! 										  merge_pathkeys,
! 										  jointype,
! 										  extra,
! 										  false,
! 										  false);
! 
! 				/* Merge join paths. */
! 				generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 										 save_jointype, extra, useallclauses,
! 										 inner_cheapest_total, merge_pathkeys,
! 										 false, true, false, false);
! 			}
! 		}
  	}
  
  	/*
*************** match_unsorted_outer(PlannerInfo *root,
*** 1416,1423 ****
  		bms_is_empty(joinrel->lateral_relids))
  	{
  		if (nestjoinOK)
! 			consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 									   save_jointype, extra);
  
  		/*
  		 * If inner_cheapest_total is NULL or non parallel-safe then find the
--- 2177,2197 ----
  		bms_is_empty(joinrel->lateral_relids))
  	{
  		if (nestjoinOK)
! 		{
! 			if (!grouped)
! 				/* Plain partial paths. */
! 				consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 									   save_jointype, extra, false, false);
! 			else
! 			{
! 				/* Grouped partial paths with explicit aggregation. */
! 				consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 										   save_jointype, extra, true, true);
! 				/* Grouped partial paths w/o explicit aggregation. */
! 				consider_parallel_nestloop(root, joinrel, outerrel, innerrel,
! 										   save_jointype, extra, true, false);
! 			}
! 		}
  
  		/*
  		 * If inner_cheapest_total is NULL or non parallel-safe then find the
*************** match_unsorted_outer(PlannerInfo *root,
*** 1437,1443 ****
  		if (inner_cheapest_total)
  			consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
  										save_jointype, extra,
! 										inner_cheapest_total);
  	}
  }
  
--- 2211,2217 ----
  		if (inner_cheapest_total)
  			consider_parallel_mergejoin(root, joinrel, outerrel, innerrel,
  										save_jointype, extra,
! 										inner_cheapest_total, grouped);
  	}
  }
  
*************** consider_parallel_mergejoin(PlannerInfo
*** 1460,1469 ****
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total)
  {
  	ListCell   *lc1;
  
  	/* generate merge join path for each partial outer path */
  	foreach(lc1, outerrel->partial_pathlist)
  	{
--- 2234,2252 ----
  							RelOptInfo *innerrel,
  							JoinType jointype,
  							JoinPathExtraData *extra,
! 							Path *inner_cheapest_total,
! 							bool grouped)
  {
  	ListCell   *lc1;
  
+ 	if (grouped)
+ 	{
+ 		/* TODO Consider if these types should be supported. */
+ 		if (jointype == JOIN_UNIQUE_OUTER ||
+ 			jointype == JOIN_UNIQUE_INNER)
+ 			return;
+ 	}
+ 
  	/* generate merge join path for each partial outer path */
  	foreach(lc1, outerrel->partial_pathlist)
  	{
*************** consider_parallel_mergejoin(PlannerInfo
*** 1476,1484 ****
  		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
  											 outerpath->pathkeys);
  
! 		generate_mergejoin_paths(root, joinrel, innerrel, outerpath, jointype,
! 								 extra, false, inner_cheapest_total,
! 								 merge_pathkeys, true);
  	}
  }
  
--- 2259,2314 ----
  		merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
  											 outerpath->pathkeys);
  
! 		if (!grouped)
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 jointype, extra, false,
! 									 inner_cheapest_total, merge_pathkeys,
! 									 true,
! 									 false, false, false);
! 		else
! 		{
! 			/*
! 			 * Create grouped join by joining plain rels and aggregating the
! 			 * result.
! 			 */
! 			Assert(joinrel->gpi != NULL);
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 jointype, extra, false,
! 									 inner_cheapest_total, merge_pathkeys,
! 									 true, false, false, true);
! 
! 			/* Combine the plain outer with grouped inner one(s). */
! 			if (outerrel->gpi == NULL && innerrel->gpi != NULL)
! 			{
! 				Path	*inner_cheapest_grouped = (Path *)
! 					linitial(innerrel->gpi->pathlist);
! 
! 				if (inner_cheapest_grouped != NULL &&
! 					inner_cheapest_grouped->parallel_safe)
! 					generate_mergejoin_paths(root, joinrel, innerrel,
! 											 outerpath, jointype, extra,
! 											 false, inner_cheapest_grouped,
! 											 merge_pathkeys,
! 											 true, false, true, false);
! 			}
! 		}
! 	}
! 
! 	/* In addition, try to join grouped outer to plain inner one(s).  */
! 	if (grouped && outerrel->gpi != NULL && innerrel->gpi == NULL)
! 	{
! 		foreach(lc1, outerrel->gpi->partial_pathlist)
! 		{
! 			Path	   *outerpath = (Path *) lfirst(lc1);
! 			List	   *merge_pathkeys;
! 
! 			merge_pathkeys = build_join_pathkeys(root, joinrel, jointype,
! 												 outerpath->pathkeys);
! 			generate_mergejoin_paths(root, joinrel, innerrel, outerpath,
! 									 jointype, extra, false,
! 									 inner_cheapest_total, merge_pathkeys,
! 									 true, true, false, false);
! 		}
  	}
  }
  
*************** consider_parallel_nestloop(PlannerInfo *
*** 1499,1513 ****
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	ListCell   *lc1;
  
  	if (jointype == JOIN_UNIQUE_INNER)
  		jointype = JOIN_INNER;
  
! 	foreach(lc1, outerrel->partial_pathlist)
  	{
  		Path	   *outerpath = (Path *) lfirst(lc1);
  		List	   *pathkeys;
--- 2329,2373 ----
  						   RelOptInfo *outerrel,
  						   RelOptInfo *innerrel,
  						   JoinType jointype,
! 						   JoinPathExtraData *extra,
! 						   bool grouped, bool do_aggregate)
  {
  	JoinType	save_jointype = jointype;
+ 	List		*outer_pathlist;
  	ListCell   *lc1;
  
+ 	if (grouped)
+ 	{
+ 		/* TODO Consider if these types should be supported. */
+ 		if (save_jointype == JOIN_UNIQUE_OUTER ||
+ 			save_jointype == JOIN_UNIQUE_INNER)
+ 			return;
+ 	}
+ 
  	if (jointype == JOIN_UNIQUE_INNER)
  		jointype = JOIN_INNER;
  
! 	if (!grouped || do_aggregate)
! 	{
! 		/*
! 		 * If creating grouped paths by explicit aggregation, the input paths
! 		 * must be plain.
! 		 */
! 		outer_pathlist = outerrel->partial_pathlist;
! 	}
! 	else if (outerrel->gpi != NULL)
! 	{
! 		/*
! 		 * Only the outer paths are accepted as grouped when we try to combine
! 		 * grouped and plain ones. Grouped inner path implies repeated
! 		 * aggregation, which doesn't sound as a good idea.
! 		 */
! 		outer_pathlist = outerrel->gpi->partial_pathlist;
! 	}
! 	else
! 		return;
! 
! 	foreach(lc1, outer_pathlist)
  	{
  		Path	   *outerpath = (Path *) lfirst(lc1);
  		List	   *pathkeys;
*************** consider_parallel_nestloop(PlannerInfo *
*** 1538,1544 ****
  			 * inner paths, but right now create_unique_path is not on board
  			 * with that.)
  			 */
! 			if (save_jointype == JOIN_UNIQUE_INNER)
  			{
  				if (innerpath != innerrel->cheapest_total_path)
  					continue;
--- 2398,2404 ----
  			 * inner paths, but right now create_unique_path is not on board
  			 * with that.)
  			 */
! 			if (save_jointype == JOIN_UNIQUE_INNER && !grouped)
  			{
  				if (innerpath != innerrel->cheapest_total_path)
  					continue;
*************** consider_parallel_nestloop(PlannerInfo *
*** 1548,1555 ****
  				Assert(innerpath);
  			}
  
! 			try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
! 									  pathkeys, jointype, extra);
  		}
  	}
  }
--- 2408,2433 ----
  				Assert(innerpath);
  			}
  
! 			if (!grouped)
! 				try_partial_nestloop_path(root, joinrel, outerpath, innerpath,
! 										  pathkeys, jointype, extra,
! 										  false, false);
! 			else if (do_aggregate)
! 			{
! 				/* Request aggregation as both input rels are plain. */
! 				try_grouped_nestloop_path(root, joinrel, outerpath, innerpath,
! 										  pathkeys, jointype, extra,
! 										  true, true);
! 			}
! 			/*
! 			 * Only combine the grouped outer path with the plain inner if the
! 			 * inner relation cannot produce grouped paths. Otherwise we could
! 			 * generate grouped paths with different targets.
! 			 */
! 			else if (innerrel->gpi == NULL)
! 				try_grouped_nestloop_path(root, joinrel, outerpath, innerpath,
! 										  pathkeys, jointype, extra,
! 										  false, true);
  		}
  	}
  }
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1571,1583 ****
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra)
  {
  	JoinType	save_jointype = jointype;
  	bool		isouterjoin = IS_OUTER_JOIN(jointype);
  	List	   *hashclauses;
  	ListCell   *l;
  
  	/*
  	 * We need to build only one hashclauses list for any given pair of outer
  	 * and inner relations; all of the hashable clauses will be used as keys.
--- 2449,2466 ----
  					 RelOptInfo *outerrel,
  					 RelOptInfo *innerrel,
  					 JoinType jointype,
! 					 JoinPathExtraData *extra,
! 					 bool grouped)
  {
  	JoinType	save_jointype = jointype;
  	bool		isouterjoin = IS_OUTER_JOIN(jointype);
  	List	   *hashclauses;
  	ListCell   *l;
  
+ 	/* No grouped join w/o grouped target. */
+ 	if (grouped && joinrel->gpi == NULL)
+ 		return;
+ 
  	/*
  	 * We need to build only one hashclauses list for any given pair of outer
  	 * and inner relations; all of the hashable clauses will be used as keys.
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1627,1632 ****
--- 2510,2518 ----
  		 * can't use a hashjoin.  (There's no use looking for alternative
  		 * input paths, since these should already be the least-parameterized
  		 * available paths.)
+ 		 *
+ 		 * (The same check should work for grouped paths, as these don't
+ 		 * differ in parameterization.)
  		 */
  		if (PATH_PARAM_BY_REL(cheapest_total_outer, innerrel) ||
  			PATH_PARAM_BY_REL(cheapest_total_inner, outerrel))
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1646,1652 ****
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra);
  			/* no possibility of cheap startup here */
  		}
  		else if (jointype == JOIN_UNIQUE_INNER)
--- 2532,2539 ----
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra,
! 							  false, false);
  			/* no possibility of cheap startup here */
  		}
  		else if (jointype == JOIN_UNIQUE_INNER)
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1662,1668 ****
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra);
  			if (cheapest_startup_outer != NULL &&
  				cheapest_startup_outer != cheapest_total_outer)
  				try_hashjoin_path(root,
--- 2549,2556 ----
  							  cheapest_total_inner,
  							  hashclauses,
  							  jointype,
! 							  extra,
! 							  false, false);
  			if (cheapest_startup_outer != NULL &&
  				cheapest_startup_outer != cheapest_total_outer)
  				try_hashjoin_path(root,
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1671,1733 ****
  								  cheapest_total_inner,
  								  hashclauses,
  								  jointype,
! 								  extra);
  		}
  		else
  		{
! 			/*
! 			 * For other jointypes, we consider the cheapest startup outer
! 			 * together with the cheapest total inner, and then consider
! 			 * pairings of cheapest-total paths including parameterized ones.
! 			 * There is no use in generating parameterized paths on the basis
! 			 * of possibly cheap startup cost, so this is sufficient.
! 			 */
! 			ListCell   *lc1;
! 			ListCell   *lc2;
! 
! 			if (cheapest_startup_outer != NULL)
! 				try_hashjoin_path(root,
! 								  joinrel,
! 								  cheapest_startup_outer,
! 								  cheapest_total_inner,
! 								  hashclauses,
! 								  jointype,
! 								  extra);
! 
! 			foreach(lc1, outerrel->cheapest_parameterized_paths)
  			{
- 				Path	   *outerpath = (Path *) lfirst(lc1);
- 
  				/*
! 				 * We cannot use an outer path that is parameterized by the
! 				 * inner rel.
  				 */
! 				if (PATH_PARAM_BY_REL(outerpath, innerrel))
! 					continue;
  
! 				foreach(lc2, innerrel->cheapest_parameterized_paths)
  				{
! 					Path	   *innerpath = (Path *) lfirst(lc2);
  
  					/*
! 					 * We cannot use an inner path that is parameterized by
! 					 * the outer rel, either.
  					 */
! 					if (PATH_PARAM_BY_REL(innerpath, outerrel))
  						continue;
  
! 					if (outerpath == cheapest_startup_outer &&
! 						innerpath == cheapest_total_inner)
! 						continue;		/* already tried it */
  
! 					try_hashjoin_path(root,
! 									  joinrel,
! 									  outerpath,
! 									  innerpath,
! 									  hashclauses,
! 									  jointype,
! 									  extra);
  				}
  			}
  		}
  
--- 2559,2712 ----
  								  cheapest_total_inner,
  								  hashclauses,
  								  jointype,
! 								  extra,
! 								  false, false);
  		}
  		else
  		{
! 			if (!grouped)
  			{
  				/*
! 				 * For other jointypes, we consider the cheapest startup outer
! 				 * together with the cheapest total inner, and then consider
! 				 * pairings of cheapest-total paths including parameterized
! 				 * ones.  There is no use in generating parameterized paths on
! 				 * the basis of possibly cheap startup cost, so this is
! 				 * sufficient.
  				 */
! 				ListCell   *lc1;
  
! 				if (cheapest_startup_outer != NULL)
! 					try_hashjoin_path(root,
! 									  joinrel,
! 									  cheapest_startup_outer,
! 									  cheapest_total_inner,
! 									  hashclauses,
! 									  jointype,
! 									  extra,
! 									  false, false);
! 
! 				foreach(lc1, outerrel->cheapest_parameterized_paths)
  				{
! 					Path	   *outerpath = (Path *) lfirst(lc1);
! 					ListCell   *lc2;
  
  					/*
! 					 * We cannot use an outer path that is parameterized by the
! 					 * inner rel.
  					 */
! 					if (PATH_PARAM_BY_REL(outerpath, innerrel))
  						continue;
  
! 					foreach(lc2, innerrel->cheapest_parameterized_paths)
! 					{
! 						Path	   *innerpath = (Path *) lfirst(lc2);
  
! 						/*
! 						 * We cannot use an inner path that is parameterized by
! 						 * the outer rel, either.
! 						 */
! 						if (PATH_PARAM_BY_REL(innerpath, outerrel))
! 							continue;
! 
! 						if (outerpath == cheapest_startup_outer &&
! 							innerpath == cheapest_total_inner)
! 							continue;		/* already tried it */
! 
! 						try_hashjoin_path(root,
! 										  joinrel,
! 										  outerpath,
! 										  innerpath,
! 										  hashclauses,
! 										  jointype,
! 										  extra,
! 										  false, false);
! 					}
! 				}
! 			}
! 			else
! 			{
! 				/* Create grouped paths if possible. */
! 				/*
! 				 * TODO
! 				 *
! 				 * Consider processing JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER
! 				 * join types, ie perform grouping of the inner / outer rel if
! 				 * it's not unique yet and if the grouping is legal.
! 				 */
! 				if (jointype == JOIN_UNIQUE_OUTER ||
! 					jointype == JOIN_UNIQUE_INNER)
! 					return;
! 
! 				/*
! 				 * Join grouped relation to non-grouped one.
! 				 *
! 				 * Do not use plain path of the input rel whose target does
! 				 * have GroupedPahtInfo. For example (assuming that join of
! 				 * two grouped rels is not supported), the only way to
! 				 * evaluate SELECT sum(a.x), sum(b.y) ... is to join "a" and
! 				 * "b" and aggregate the result. Otherwise the path target
! 				 * wouldn't match joinrel->gpi->target. TODO Move this comment
! 				 * elsewhere as it seems common to all join kinds.
! 				 */
! 				/*
! 				 * TODO Allow outer join if the grouped rel is on the
! 				 * non-nullable side.
! 				 */
! 				if (jointype == JOIN_INNER)
! 				{
! 					Path	*grouped_path, *plain_path;
! 
! 					if (outerrel->gpi != NULL &&
! 						outerrel->gpi->pathlist != NIL &&
! 						innerrel->gpi == NULL)
! 					{
! 						grouped_path = (Path *)
! 							linitial(outerrel->gpi->pathlist);
! 						plain_path = cheapest_total_inner;
! 						try_grouped_hashjoin_path(root, joinrel,
! 												  grouped_path, plain_path,
! 												  hashclauses, jointype,
! 												  extra, false, false);
! 					}
! 					else if (innerrel->gpi != NULL &&
! 							 innerrel->gpi->pathlist != NIL &&
! 							 outerrel->gpi == NULL)
! 					{
! 						grouped_path = (Path *)
! 							linitial(innerrel->gpi->pathlist);
! 						plain_path = cheapest_total_outer;
! 						try_grouped_hashjoin_path(root, joinrel, plain_path,
! 												  grouped_path, hashclauses,
! 												  jointype, extra,
! 												  false, false);
! 
! 						if (cheapest_startup_outer != NULL &&
! 							cheapest_startup_outer != cheapest_total_outer)
! 						{
! 							plain_path = cheapest_startup_outer;
! 							try_grouped_hashjoin_path(root, joinrel,
! 													  plain_path,
! 													  grouped_path,
! 													  hashclauses,
! 													  jointype, extra,
! 													  false, false);
! 						}
! 					}
  				}
+ 
+ 				/*
+ 				 * Try to join plain relations and make a grouped rel out of
+ 				 * the join.
+ 				 *
+ 				 * Since aggregation needs the whole relation, we are only
+ 				 * interested in total costs.
+ 				 */
+ 				try_grouped_hashjoin_path(root, joinrel,
+ 										  cheapest_total_outer,
+ 										  cheapest_total_inner,
+ 										  hashclauses,
+ 										  jointype, extra, true, false);
  			}
  		}
  
*************** hash_inner_and_outer(PlannerInfo *root,
*** 1765,1777 ****
  				cheapest_safe_inner =
  					get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
  
! 			if (cheapest_safe_inner != NULL)
! 				try_partial_hashjoin_path(root, joinrel,
! 										  cheapest_partial_outer,
! 										  cheapest_safe_inner,
! 										  hashclauses, jointype, extra);
  		}
  	}
  }
  
  /*
--- 2744,2967 ----
  				cheapest_safe_inner =
  					get_cheapest_parallel_safe_total_inner(innerrel->pathlist);
  
! 			if (!grouped)
! 			{
! 				if (cheapest_safe_inner != NULL)
! 					try_partial_hashjoin_path(root, joinrel,
! 											  cheapest_partial_outer,
! 											  cheapest_safe_inner,
! 											  hashclauses, jointype, extra,
! 											  false, false);
! 			}
! 			else if (joinrel->gpi != NULL)
! 			{
! 				/*
! 				 * Grouped partial path.
! 				 *
! 				 * 1. Apply aggregation to the plain partial join path.
! 				 */
! 				if (cheapest_safe_inner != NULL)
! 					try_grouped_hashjoin_path(root, joinrel,
! 											  cheapest_partial_outer,
! 											  cheapest_safe_inner,
! 											  hashclauses,
! 											  jointype, extra, true, true);
! 
! 				/*
! 				 * 2. Join the cheapest partial grouped outer path (if one
! 				 * exists) to cheapest_safe_inner (there's no reason to look
! 				 * for another inner path than what we used for non-grouped
! 				 * partial join path).
! 				 */
! 				if (outerrel->gpi != NULL &&
! 					outerrel->gpi->partial_pathlist != NIL &&
! 					innerrel->gpi == NULL &&
! 					cheapest_safe_inner != NULL)
! 				{
! 					Path	*outer_path;
! 
! 					outer_path = (Path *)
! 						linitial(outerrel->gpi->partial_pathlist);
! 
! 					try_grouped_hashjoin_path(root, joinrel, outer_path,
! 											  cheapest_safe_inner,
! 											  hashclauses,
! 											  jointype, extra, false, true);
! 				}
! 
! 				/*
! 				 * 3. Join the cheapest_partial_outer path (again, no reason
! 				 * to use different outer path than the one we used for plain
! 				 * partial join) to the cheapest grouped inner path if the
! 				 * latter exists and is parallel-safe.
! 				 */
! 				if (innerrel->gpi != NULL &&
! 					innerrel->gpi->pathlist != NIL &&
! 					outerrel->gpi == NULL)
! 				{
! 					Path	*inner_path;
! 
! 					inner_path = (Path *) linitial(innerrel->gpi->pathlist);
! 
! 					if (inner_path->parallel_safe)
! 						try_grouped_hashjoin_path(root, joinrel,
! 												  cheapest_partial_outer,
! 												  inner_path,
! 												  hashclauses,
! 												  jointype, extra,
! 												  false, true);
! 				}
! 
! 				/*
! 				 * Other combinations seem impossible because: 1. At most 1
! 				 * input relation of the join can be grouped, 2. the inner
! 				 * path must not be partial.
! 				 */
! 			}
! 		}
! 	}
! }
! 
! /*
!  * Do the input paths emit all the aggregates contained in the grouped target
!  * of the join?
!  *
!  * The point is that one input relation might be unable to evaluate some
!  * aggregate(s), so it'll only generate plain paths. It's wrong to combine
!  * such plain paths with grouped ones that the other input rel might be able
!  * to generate because the result would miss the aggregate(s) the first
!  * relation failed to evaluate.
!  *
!  * TODO For better efficiency, consider storing Bitmapset of
!  * GroupedVarInfo.gvid in GroupedPathInfo.
!  */
! static bool
! is_grouped_join_target_complete(PlannerInfo *root, PathTarget *jointarget,
! 								Path *outer_path, Path *inner_path)
! {
! 	RelOptInfo	*outer_rel = outer_path->parent;
! 	RelOptInfo	*inner_rel = inner_path->parent;
! 	ListCell	*l1;
! 
! 	/*
! 	 * Join of two grouped relations is not supported.
! 	 *
! 	 * This actually isn't check of target completeness --- can it be located
! 	 * elsewhere?
! 	 */
! 	if (outer_rel->gpi != NULL && inner_rel->gpi != NULL)
! 		return false;
! 
! 	foreach(l1, jointarget->exprs)
! 	{
! 		Expr	*expr = (Expr *) lfirst(l1);
! 		GroupedVar	*gvar;
! 		GroupedVarInfo	*gvi = NULL;
! 		ListCell	*l2;
! 		bool	found = false;
! 
! 		/* Only interested in aggregates. */
! 		if (!IsA(expr, GroupedVar))
! 			continue;
! 
! 		gvar = castNode(GroupedVar, expr);
! 
! 		/* Find the corresponding GroupedVarInfo. */
! 		foreach(l2, root->grouped_var_list)
! 		{
! 			GroupedVarInfo	*gvi_tmp = castNode(GroupedVarInfo, lfirst(l2));
! 
! 			if (gvi_tmp->gvid == gvar->gvid)
! 			{
! 				gvi = gvi_tmp;
! 				break;
! 			}
! 		}
! 		Assert(gvi != NULL);
! 
! 		/*
! 		 * If any aggregate references both input relations, something went
! 		 * wrong during construction of one of the input targets: one input
! 		 * rel is grouped, but no grouping target should have been created for
! 		 * it if some aggregate required more than that input rel.
! 		 */
! 		Assert(gvi->gv_eval_at == NULL ||
! 			   !(bms_overlap(gvi->gv_eval_at, outer_rel->relids) &&
! 				 bms_overlap(gvi->gv_eval_at, inner_rel->relids)));
! 
! 		/*
! 		 * If the aggregate belongs to the plain relation, it probably
! 		 * means that non-grouping expression made aggregation of that
! 		 * input relation impossible. Since that expression is not
! 		 * necessarily emitted by the current join, aggregation might be
! 		 * possible here. On the other hand, aggregation of a join which
! 		 * already contains a grouped relation does not seem too
! 		 * beneficial.
! 		 *
! 		 * XXX The condition below is also met if the query contains both
! 		 * "star aggregate" and a normal one. Since the earlier can be
! 		 * added to any base relation, and since we don't support join of
! 		 * 2 grouped relations, join of arbitrary 2 relations will always
! 		 * result in a plain relation.
! 		 *
! 		 * XXX If we conclude that aggregation is worth, only consider
! 		 * this test failed if target usable for aggregation cannot be
! 		 * created (i.e. the non-grouping expression is in the output of
! 		 * the current join).
! 		 */
! 		if ((outer_rel->gpi == NULL &&
! 			 bms_overlap(gvi->gv_eval_at, outer_rel->relids))
! 			|| (inner_rel->gpi == NULL &&
! 				bms_overlap(gvi->gv_eval_at, inner_rel->relids)))
! 			return false;
! 
! 		/* Look for the aggregate in the input targets. */
! 		if (outer_rel->gpi != NULL)
! 		{
! 			/* No more than one input path should be grouped. */
! 			Assert(inner_rel->gpi == NULL);
! 
! 			foreach(l2, outer_path->pathtarget->exprs)
! 			{
! 				expr = (Expr *) lfirst(l2);
! 
! 				if (!IsA(expr, GroupedVar))
! 					continue;
! 
! 				gvar = castNode(GroupedVar, expr);
! 				if (gvar->gvid == gvi->gvid)
! 				{
! 					found = true;
! 					break;
! 				}
! 			}
  		}
+ 		else if (!found && inner_rel->gpi != NULL)
+ 		{
+ 			Assert(outer_rel->gpi == NULL);
+ 
+ 			foreach(l2, inner_path->pathtarget->exprs)
+ 			{
+ 				expr = (Expr *) lfirst(l2);
+ 
+ 				if (!IsA(expr, GroupedVar))
+ 					continue;
+ 
+ 				gvar = castNode(GroupedVar, expr);
+ 				if (gvar->gvid == gvi->gvid)
+ 				{
+ 					found = true;
+ 					break;
+ 				}
+ 			}
+ 		}
+ 
+ 		/* Even a single missing aggregate causes the whole test to fail. */
+ 		if (!found)
+ 			return false;
  	}
+ 
+ 	return true;
  }
  
  /*
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
new file mode 100644
index 5a68de3..ea24ed9
*** a/src/backend/optimizer/path/joinrels.c
--- b/src/backend/optimizer/path/joinrels.c
***************
*** 14,23 ****
--- 14,29 ----
   */
  #include "postgres.h"
  
+ #include "miscadmin.h"
+ #include "nodes/relation.h"
+ #include "optimizer/clauses.h"
  #include "optimizer/joininfo.h"
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
+ #include "optimizer/prep.h"
+ #include "optimizer/cost.h"
  #include "utils/memutils.h"
+ #include "utils/lsyscache.h"
  
  
  static void make_rels_by_clause_joins(PlannerInfo *root,
*************** static void make_rels_by_clauseless_join
*** 29,40 ****
  static bool has_join_restriction(PlannerInfo *root, RelOptInfo *rel);
  static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
  static bool is_dummy_rel(RelOptInfo *rel);
- static void mark_dummy_rel(RelOptInfo *rel);
  static bool restriction_is_constant_false(List *restrictlist,
  							  bool only_pushed_down);
  static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
  							RelOptInfo *rel2, RelOptInfo *joinrel,
  							SpecialJoinInfo *sjinfo, List *restrictlist);
  
  
  /*
--- 35,53 ----
  static bool has_join_restriction(PlannerInfo *root, RelOptInfo *rel);
  static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
  static bool is_dummy_rel(RelOptInfo *rel);
  static bool restriction_is_constant_false(List *restrictlist,
  							  bool only_pushed_down);
  static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
  							RelOptInfo *rel2, RelOptInfo *joinrel,
  							SpecialJoinInfo *sjinfo, List *restrictlist);
+ static void try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1,
+ 						  RelOptInfo *rel2, RelOptInfo *joinrel,
+ 						  SpecialJoinInfo *parent_sjinfo,
+ 						  List *parent_restrictlist);
+ static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel);
+ static void build_joinrel_partition_bounds(RelOptInfo *rel1, RelOptInfo *rel2,
+ 							   RelOptInfo *joinrel, JoinType jointype,
+ 							   List **rel1_parts, List **rel2_parts);
  
  
  /*
*************** make_join_rel(PlannerInfo *root, RelOptI
*** 731,736 ****
--- 744,752 ----
  	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
  								restrictlist);
  
+ 	/* Apply partition-wise join technique, if possible. */
+ 	try_partition_wise_join(root, rel1, rel2, joinrel, sjinfo, restrictlist);
+ 
  	bms_free(joinrelids);
  
  	return joinrel;
*************** is_dummy_rel(RelOptInfo *rel)
*** 1197,1203 ****
   * is that the best solution is to explicitly make the dummy path in the same
   * context the given RelOptInfo is in.
   */
! static void
  mark_dummy_rel(RelOptInfo *rel)
  {
  	MemoryContext oldcontext;
--- 1213,1219 ----
   * is that the best solution is to explicitly make the dummy path in the same
   * context the given RelOptInfo is in.
   */
! void
  mark_dummy_rel(RelOptInfo *rel)
  {
  	MemoryContext oldcontext;
*************** mark_dummy_rel(RelOptInfo *rel)
*** 1217,1223 ****
  	rel->partial_pathlist = NIL;
  
  	/* Set up the dummy path */
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL));
  
  	/* Set or update cheapest_total_path and related fields */
  	set_cheapest(rel);
--- 1233,1239 ----
  	rel->partial_pathlist = NIL;
  
  	/* Set up the dummy path */
! 	add_path(rel, (Path *) create_append_path(rel, NIL, NULL, 0, NIL), false);
  
  	/* Set or update cheapest_total_path and related fields */
  	set_cheapest(rel);
*************** restriction_is_constant_false(List *rest
*** 1268,1270 ****
--- 1284,1712 ----
  	}
  	return false;
  }
+ 
+ /*
+  * Assess whether join between given two partitioned relations can be broken
+  * down into joins between matching partitions; a technique called
+  * "partition-wise join"
+  *
+  * Partition-wise join is possible when a. Joining relations have same
+  * partitioning scheme b. There exists an equi-join between the partition keys
+  * of the two relations.
+  *
+  * Partition-wise join is planned as follows (details: optimizer/README.)
+  *
+  * 1. Create the RelOptInfos for joins between matching partitions i.e
+  * child-joins and add paths those.
+  *
+  * 2. Add "append" paths to join between parent relations. The second phase is
+  * implemented by generate_partition_wise_join_paths().
+  *
+  * The RelOptInfo, SpecialJoinInfo and restrictlist for each child join are
+  * obtained by translating the respective parent join structures.
+  */
+ static void
+ try_partition_wise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
+ 						RelOptInfo *joinrel, SpecialJoinInfo *parent_sjinfo,
+ 						List *parent_restrictlist)
+ {
+ 	int			nparts;
+ 	int			cnt_parts;
+ 	ListCell   *lc1;
+ 	ListCell   *lc2;
+ 	List	   *rel1_parts;
+ 	List	   *rel2_parts;
+ 	bool		is_strict;
+ 
+ 	/* Guard against stack overflow due to overly deep partition hierarchy. */
+ 	check_stack_depth();
+ 
+ 	/* Nothing to do, if the join relation is not partitioned. */
+ 	if (!joinrel->part_scheme)
+ 		return;
+ 
+ 	/*
+ 	 * set_append_rel_pathlist() may not create paths in children of an empty
+ 	 * partitioned table and so we can not add paths to a child-joins when one
+ 	 * of the joining relations is empty. So, deem such a join as
+ 	 * unpartitioned.
+ 	 */
+ 	if (IS_DUMMY_REL(rel1) || IS_DUMMY_REL(rel2))
+ 		return;
+ 
+ 	/*
+ 	 * Since this join relation is partitioned, all the base relations
+ 	 * participating in this join must be partitioned and so are all the
+ 	 * intermediate join relations.
+ 	 */
+ 	Assert(rel1->part_scheme && rel2->part_scheme);
+ 
+ 	/*
+ 	 * Every pair of joining relations we see here should have an equi-join
+ 	 * between partition keys if this join has been deemed as a partitioned
+ 	 * join. See build_joinrel_partition_info() for reasons.
+ 	 */
+ 	Assert(have_partkey_equi_join(rel1, rel2, parent_sjinfo->jointype,
+ 								  parent_restrictlist, &is_strict));
+ 
+ 	/*
+ 	 * The partition scheme of the join relation should match that of the
+ 	 * joining relations.
+ 	 */
+ 	Assert(joinrel->part_scheme == rel1->part_scheme &&
+ 		   joinrel->part_scheme == rel2->part_scheme);
+ 
+ 	/* We should have RelOptInfos of the partitions available. */
+ 	Assert(rel1->part_rels && rel2->part_rels);
+ 
+ 	/*
+ 	 * Calculate bounds for the join relation. If we can not come up with joint
+ 	 * bounds, we can not use partition-wise join.
+ 	 */
+ 	build_joinrel_partition_bounds(rel1, rel2, joinrel,
+ 								   parent_sjinfo->jointype, &rel1_parts,
+ 								   &rel2_parts);
+ 	if (!joinrel->boundinfo)
+ 		return;
+ 
+ 	Assert(list_length(rel1_parts) == list_length(rel2_parts));
+ 	Assert(joinrel->nparts == list_length(rel1_parts));
+ 	Assert(joinrel->nparts > 0);
+ 
+ 	nparts = joinrel->nparts;
+ 
+ 	elog(DEBUG3, "join between relations %s and %s is considered for partition-wise join.",
+ 		 bmsToString(rel1->relids), bmsToString(rel2->relids));
+ 
+ 	/* Allocate space for hold child-joins RelOptInfos, if not already done. */
+ 	if (!joinrel->part_rels)
+ 		joinrel->part_rels = (RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
+ 
+ 	/*
+ 	 * Create child join relations for this partitioned join, if those don't
+ 	 * exist. Add paths to child-joins for a pair of child relations
+ 	 * corresponding corresponding to the given pair of parent relations.
+ 	 */
+ 	cnt_parts = 0;
+ 	forboth (lc1, rel1_parts, lc2, rel2_parts)
+ 	{
+ 		RelOptInfo *child_rel1 = lfirst(lc1);
+ 		RelOptInfo *child_rel2 = lfirst(lc2);
+ 		SpecialJoinInfo	*child_sjinfo;
+ 		List   *child_restrictlist;
+ 		RelOptInfo *child_joinrel;
+ 		Relids	child_joinrelids;
+ 		AppendRelInfo **appinfos;
+ 		int		nappinfos;
+ 
+ 		/* We should never try to join two overlapping sets of rels. */
+ 		Assert(!bms_overlap(child_rel1->relids, child_rel2->relids));
+ 		child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids);
+ 		appinfos = find_appinfos_by_relids(root, child_joinrelids, &nappinfos);
+ 
+ 		/*
+ 		 * Construct SpecialJoinInfo from parent join relations's
+ 		 * SpecialJoinInfo.
+ 		 */
+ 		child_sjinfo = build_child_join_sjinfo(root, parent_sjinfo,
+ 											   child_rel1->relids,
+ 											   child_rel2->relids);
+ 
+ 		/*
+ 		 * Construct restrictions applicable to the child join from
+ 		 * those applicable to the parent join.
+ 		 */
+ 		child_restrictlist = (List *) adjust_appendrel_attrs(root,
+ 												  (Node *) parent_restrictlist,
+ 														  nappinfos, appinfos);
+ 
+ 		child_joinrel = joinrel->part_rels[cnt_parts];
+ 		if (!child_joinrel)
+ 		{
+ 			child_joinrel = build_child_join_rel(root, child_rel1, child_rel2,
+ 												 joinrel, child_restrictlist,
+ 												 child_sjinfo,
+ 												 child_sjinfo->jointype);
+ 			joinrel->part_rels[cnt_parts] = child_joinrel;
+ 		}
+ 
+ 		Assert(bms_equal(child_joinrel->relids, child_joinrelids));
+ 
+ 		/* Also translate expressions that AggPath will use in its target. */
+ 		if (child_joinrel->gpi != NULL)
+ 		{
+ 			Assert(child_joinrel->gpi->target != NULL);
+ 
+ 			child_joinrel->gpi->target->exprs =
+ 				(List *) adjust_appendrel_attrs(root,
+ 												(Node *) child_joinrel->gpi->target->exprs,
+ 												nappinfos, appinfos);
+ 		}
+ 
+ 		populate_joinrel_with_paths(root, child_rel1, child_rel2,
+ 									child_joinrel, child_sjinfo,
+ 									child_restrictlist);
+ 
+ 		pfree(appinfos);
+ 
+ 		/*
+ 		 * If the child relations themselves are partitioned, try partition-wise join
+ 		 * recursively.
+ 		 */
+ 		try_partition_wise_join(root, child_rel1, child_rel2, child_joinrel,
+ 								child_sjinfo, child_restrictlist);
+ 		cnt_parts++;
+ 	}
+ }
+ 
+ /*
+  * Returns true if there exists an equi-join condition for each pair of
+  * partition key from given relations being joined.
+  */
+ bool
+ have_partkey_equi_join(RelOptInfo *rel1, RelOptInfo *rel2, JoinType jointype,
+ 					   List *restrictlist, bool *is_strict)
+ {
+ 	PartitionScheme	part_scheme = rel1->part_scheme;
+ 	ListCell	*lc;
+ 	int		cnt_pks;
+ 	int		num_pks;
+ 	bool   *pk_has_clause;
+ 
+ 	*is_strict = false;
+ 
+ 	/*
+ 	 * This function should be called when the joining relations have same
+ 	 * partitioning scheme.
+ 	 */
+ 	Assert(rel1->part_scheme == rel2->part_scheme);
+ 	Assert(part_scheme);
+ 
+ 	num_pks = part_scheme->partnatts;
+ 
+ 	pk_has_clause = (bool *) palloc0(sizeof(bool) * num_pks);
+ 
+ 	foreach (lc, restrictlist)
+ 	{
+ 		RestrictInfo *rinfo = lfirst(lc);
+ 		OpExpr		 *opexpr;
+ 		Expr		 *expr1;
+ 		Expr		 *expr2;
+ 		int		ipk1;
+ 		int		ipk2;
+ 
+ 		/* If processing an outer join, only use its own join clauses. */
+ 		if (IS_OUTER_JOIN(jointype) && rinfo->is_pushed_down)
+ 			continue;
+ 
+ 		/* Skip clauses which can not be used for a join. */
+ 		if (!rinfo->can_join)
+ 			continue;
+ 
+ 		/* Skip clauses which are not equality conditions. */
+ 		if (!rinfo->mergeopfamilies)
+ 			continue;
+ 
+ 		opexpr = (OpExpr *) rinfo->clause;
+ 		Assert(is_opclause(opexpr));
+ 
+ 		/*
+ 		 * The equi-join between partition keys is strict if equi-join between
+ 		 * at least one partition key is using a strict operator. See
+ 		 * explanation about outer join reordering identity 3 in
+ 		 * optimizer/README
+ 		 */
+ 		*is_strict = *is_strict || op_strict(opexpr->opno);
+ 
+ 		/* Match the operands to the relation. */
+ 		if (bms_is_subset(rinfo->left_relids, rel1->relids) &&
+ 			bms_is_subset(rinfo->right_relids, rel2->relids))
+ 		{
+ 			expr1 = linitial(opexpr->args);
+ 			expr2 = lsecond(opexpr->args);
+ 		}
+ 		else if (bms_is_subset(rinfo->left_relids, rel2->relids) &&
+ 				 bms_is_subset(rinfo->right_relids, rel1->relids))
+ 		{
+ 			expr1 = lsecond(opexpr->args);
+ 			expr2 = linitial(opexpr->args);
+ 		}
+ 		else
+ 			continue;
+ 
+ 		/*
+ 		 * Only clauses referencing the partition keys are useful for
+ 		 * partition-wise join.
+ 		 */
+ 		ipk1 = match_expr_to_partition_keys(expr1, rel1);
+ 		if (ipk1 < 0)
+ 			continue;
+ 		ipk2 = match_expr_to_partition_keys(expr2, rel2);
+ 		if (ipk2 < 0)
+ 			continue;
+ 
+ 		/*
+ 		 * If the clause refers to keys at different cardinal positions in the
+ 		 * partition keys of joining relations, it can not be used for
+ 		 * partition-wise join.
+ 		 */
+ 		if (ipk1 != ipk2)
+ 			continue;
+ 
+ 		/*
+ 		 * The clause allows partition-wise join if only it uses the same
+ 		 * operator family as that specified by the partition key.
+ 		 */
+ 		if (!list_member_oid(rinfo->mergeopfamilies,
+ 							 part_scheme->partopfamily[ipk1]))
+ 			continue;
+ 
+ 		/* Mark the partition key as having an equi-join clause. */
+ 		pk_has_clause[ipk1] = true;
+ 	}
+ 
+ 	/* Check whether every partition key has an equi-join condition. */
+ 	for (cnt_pks = 0; cnt_pks < num_pks; cnt_pks++)
+ 	{
+ 		if (!pk_has_clause[cnt_pks])
+ 		{
+ 			pfree(pk_has_clause);
+ 			return false;
+ 		}
+ 	}
+ 
+ 	pfree(pk_has_clause);
+ 	return true;
+ }
+ 
+ /*
+  * Find the partition key from the given relation matching the given
+  * expression. If found, return the index of the partition key, else return -1.
+  */
+ static int
+ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel)
+ {
+ 	int		cnt_pks;
+ 	int		num_pks;
+ 
+ 	/* This function should be called only for partitioned relations. */
+ 	Assert(rel->part_scheme);
+ 
+ 	num_pks = rel->part_scheme->partnatts;
+ 
+ 	/* Remove the relabel decoration. */
+ 	while (IsA(expr, RelabelType))
+ 		expr = (Expr *) (castNode(RelabelType, expr))->arg;
+ 
+ 	for (cnt_pks = 0; cnt_pks < num_pks; cnt_pks++)
+ 	{
+ 		List	 *pkexprs = rel->partexprs[cnt_pks];
+ 		ListCell *lc;
+ 
+ 		foreach(lc, pkexprs)
+ 		{
+ 			Expr *pkexpr = lfirst(lc);
+ 			if (equal(pkexpr, expr))
+ 				return cnt_pks;
+ 		}
+ 	}
+ 
+ 	return -1;
+ }
+ 
+ /*
+  * Calculate the bounds/lists of the join relation based on partition bounds of the
+  * joining relations. Also returns the matching partitions from the joining
+  * relations.
+  *
+  * As of now, it simply checks whether the bounds/lists of the joining
+  * relations match and returns bounds/lists of the first relation. In future
+  * this function will be expanded to merge the bounds/lists from the joining
+  * relations to produce the bounds/lists of the join relation. If the function
+  * fails to merge the bounds/lists, it returns NULL and the lists are also NIL.
+  *
+  * The function also returns two lists of RelOptInfos, one for each joining
+  * relation. The RelOptInfos at the same position in each of the lists give the
+  * partitions with matching bounds which can be joined to produce join relation
+  * corresponding to the merged partition bounds corresponding to that position.
+  * When there doesn't exist a matching partition on either side, corresponding
+  * RelOptInfo will be NULL.
+  */
+ static void
+ build_joinrel_partition_bounds(RelOptInfo *rel1, RelOptInfo *rel2,
+ 							   RelOptInfo *joinrel, JoinType jointype,
+ 							   List **rel1_parts, List **rel2_parts)
+ {
+ 	PartitionScheme	part_scheme;
+ 	int			cnt;
+ 	int			nparts;
+ 	int16	   *parttyplen;
+ 	bool	   *parttypbyval;
+ 
+ 	Assert(rel1->part_scheme == rel2->part_scheme);
+ 	Assert(rel1->nparts == rel2->nparts);
+ 	*rel1_parts = NIL;
+ 	*rel2_parts = NIL;
+ 
+ 	part_scheme = rel1->part_scheme;
+ 
+ 	/*
+ 	 * Ideally, we should be able to join two relations which have different
+ 	 * number of partitions as long as the bounds of partitions available on
+ 	 * both the sides match. But for now, we need exact same number of
+ 	 * partitions on both the sides.
+ 	 */
+ 	if (rel1->nparts != rel2->nparts)
+ 	{
+ 		/*
+ 		 * If this pair of joining relations did not have same number of
+ 		 * partitions no other pair can have same number of partitions.
+ 		 */
+ 		Assert(!joinrel->boundinfo && joinrel->nparts == 0);
+ 		return;
+ 	}
+ 
+ 
+ 	parttyplen = (int16 *) palloc(sizeof(int16) * part_scheme->partnatts);
+ 	parttypbyval = (bool *) palloc(sizeof(bool) * part_scheme->partnatts);
+ 	for (cnt = 0; cnt < part_scheme->partnatts; cnt++)
+ 		get_typlenbyval(part_scheme->partopcintype[cnt], &parttyplen[cnt],
+ 						&parttypbyval[cnt]);
+ 
+ 	if (!partition_bounds_equal(part_scheme->partnatts, parttyplen,
+ 								parttypbyval, rel1->boundinfo,
+ 								rel2->boundinfo))
+ 	{
+ 		/*
+ 		 * If this pair of joining relations did not have same partition bounds
+ 		 * no other pair can have same partition bounds.
+ 		 */
+ 		Assert(!joinrel->boundinfo && joinrel->nparts == 0);
+ 		return;
+ 	}
+ 
+ 	nparts = rel1->nparts;
+ 	for (cnt = 0; cnt < nparts; cnt++)
+ 	{
+ 		*rel1_parts = lappend(*rel1_parts, rel1->part_rels[cnt]);
+ 		*rel2_parts = lappend(*rel2_parts, rel2->part_rels[cnt]);
+ 	}
+ 
+ 	/* Set the partition bounds if not already set. */
+ 	if (!joinrel->boundinfo)
+ 	{
+ 		joinrel->boundinfo = rel1->boundinfo;
+ 		joinrel->nparts = rel1->nparts;
+ 	}
+ 	else
+ 	{
+ 		/* Verify existing bounds. */
+ 		Assert(partition_bounds_equal(part_scheme->partnatts, parttyplen,
+ 									  parttypbyval, joinrel->boundinfo,
+ 									  rel1->boundinfo));
+ 		Assert(joinrel->nparts == rel1->nparts);
+ 	}
+ 
+ 	pfree(parttyplen);
+ 	pfree(parttypbyval);
+ }
diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
new file mode 100644
index a2fe661..91d855c
*** a/src/backend/optimizer/path/tidpath.c
--- b/src/backend/optimizer/path/tidpath.c
*************** create_tidscan_paths(PlannerInfo *root,
*** 266,270 ****
  
  	if (tidquals)
  		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
! 												   required_outer));
  }
--- 266,270 ----
  
  	if (tidquals)
  		add_path(rel, (Path *) create_tidscan_path(root, rel, tidquals,
! 												   required_outer), false);
  }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
new file mode 100644
index 95e6eb7..3f1389f
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*************** static Plan *prepare_sort_from_pathkeys(
*** 252,258 ****
  static EquivalenceMember *find_ec_member_for_tle(EquivalenceClass *ec,
  					   TargetEntry *tle,
  					   Relids relids);
! static Sort *make_sort_from_pathkeys(Plan *lefttree, List *pathkeys);
  static Sort *make_sort_from_groupcols(List *groupcls,
  						 AttrNumber *grpColIdx,
  						 Plan *lefttree);
--- 252,259 ----
  static EquivalenceMember *find_ec_member_for_tle(EquivalenceClass *ec,
  					   TargetEntry *tle,
  					   Relids relids);
! static Sort *make_sort_from_pathkeys(Plan *lefttree, List *pathkeys,
! 									 Relids relids);
  static Sort *make_sort_from_groupcols(List *groupcls,
  						 AttrNumber *grpColIdx,
  						 Plan *lefttree);
*************** create_sort_plan(PlannerInfo *root, Sort
*** 1650,1656 ****
  	subplan = create_plan_recurse(root, best_path->subpath,
  								  flags | CP_SMALL_TLIST);
  
! 	plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys);
  
  	copy_generic_path_info(&plan->plan, (Path *) best_path);
  
--- 1651,1657 ----
  	subplan = create_plan_recurse(root, best_path->subpath,
  								  flags | CP_SMALL_TLIST);
  
! 	plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys, NULL);
  
  	copy_generic_path_info(&plan->plan, (Path *) best_path);
  
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3767,3772 ****
--- 3768,3775 ----
  	ListCell   *lc;
  	ListCell   *lop;
  	ListCell   *lip;
+ 	Path	   *outer_path = best_path->jpath.outerjoinpath;
+ 	Path	   *inner_path = best_path->jpath.innerjoinpath;
  
  	/*
  	 * MergeJoin can project, so we don't have to demand exact tlists from the
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3830,3837 ****
  	 */
  	if (best_path->outersortkeys)
  	{
  		Sort	   *sort = make_sort_from_pathkeys(outer_plan,
! 												   best_path->outersortkeys);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		outer_plan = (Plan *) sort;
--- 3833,3842 ----
  	 */
  	if (best_path->outersortkeys)
  	{
+ 		Relids		outer_relids = outer_path->parent->relids;
  		Sort	   *sort = make_sort_from_pathkeys(outer_plan,
! 												   best_path->outersortkeys,
! 												   outer_relids);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		outer_plan = (Plan *) sort;
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3842,3849 ****
  
  	if (best_path->innersortkeys)
  	{
  		Sort	   *sort = make_sort_from_pathkeys(inner_plan,
! 												   best_path->innersortkeys);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		inner_plan = (Plan *) sort;
--- 3847,3856 ----
  
  	if (best_path->innersortkeys)
  	{
+ 		Relids		inner_relids = inner_path->parent->relids;
  		Sort	   *sort = make_sort_from_pathkeys(inner_plan,
! 												   best_path->innersortkeys,
! 												   inner_relids);
  
  		label_sort_with_costsize(root, sort, -1.0);
  		inner_plan = (Plan *) sort;
*************** prepare_sort_from_pathkeys(Plan *lefttre
*** 5687,5697 ****
  					continue;
  
  				/*
! 				 * Ignore child members unless they match the rel being
  				 * sorted.
  				 */
  				if (em->em_is_child &&
! 					!bms_equal(em->em_relids, relids))
  					continue;
  
  				sortexpr = em->em_expr;
--- 5694,5704 ----
  					continue;
  
  				/*
! 				 * Ignore child members unless they belong to the rel being
  				 * sorted.
  				 */
  				if (em->em_is_child &&
! 					!bms_is_subset(em->em_relids, relids))
  					continue;
  
  				sortexpr = em->em_expr;
*************** find_ec_member_for_tle(EquivalenceClass
*** 5803,5812 ****
  			continue;
  
  		/*
! 		 * Ignore child members unless they match the rel being sorted.
  		 */
  		if (em->em_is_child &&
! 			!bms_equal(em->em_relids, relids))
  			continue;
  
  		/* Match if same expression (after stripping relabel) */
--- 5810,5819 ----
  			continue;
  
  		/*
! 		 * Ignore child members unless they belong to the rel being sorted.
  		 */
  		if (em->em_is_child &&
! 			!bms_is_subset(em->em_relids, relids))
  			continue;
  
  		/* Match if same expression (after stripping relabel) */
*************** find_ec_member_for_tle(EquivalenceClass
*** 5827,5835 ****
   *
   *	  'lefttree' is the node which yields input tuples
   *	  'pathkeys' is the list of pathkeys by which the result is to be sorted
   */
  static Sort *
! make_sort_from_pathkeys(Plan *lefttree, List *pathkeys)
  {
  	int			numsortkeys;
  	AttrNumber *sortColIdx;
--- 5834,5843 ----
   *
   *	  'lefttree' is the node which yields input tuples
   *	  'pathkeys' is the list of pathkeys by which the result is to be sorted
+  *	  'relids' is the set of relations required by prepare_sort_from_pathkeys()
   */
  static Sort *
! make_sort_from_pathkeys(Plan *lefttree, List *pathkeys, Relids relids)
  {
  	int			numsortkeys;
  	AttrNumber *sortColIdx;
*************** make_sort_from_pathkeys(Plan *lefttree,
*** 5839,5845 ****
  
  	/* Compute sort column info, and adjust lefttree as needed */
  	lefttree = prepare_sort_from_pathkeys(lefttree, pathkeys,
! 										  NULL,
  										  NULL,
  										  false,
  										  &numsortkeys,
--- 5847,5853 ----
  
  	/* Compute sort column info, and adjust lefttree as needed */
  	lefttree = prepare_sort_from_pathkeys(lefttree, pathkeys,
! 										  relids,
  										  NULL,
  										  false,
  										  &numsortkeys,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
new file mode 100644
index ebd442a..0313c71
*** a/src/backend/optimizer/plan/initsplan.c
--- b/src/backend/optimizer/plan/initsplan.c
***************
*** 14,20 ****
--- 14,22 ----
   */
  #include "postgres.h"
  
+ #include "access/sysattr.h"
  #include "catalog/pg_type.h"
+ #include "catalog/pg_class.h"
  #include "nodes/nodeFuncs.h"
  #include "optimizer/clauses.h"
  #include "optimizer/cost.h"
***************
*** 26,31 ****
--- 28,34 ----
  #include "optimizer/planner.h"
  #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
+ #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/analyze.h"
  #include "rewrite/rewriteManip.h"
*************** typedef struct PostponedQual
*** 45,50 ****
--- 48,54 ----
  } PostponedQual;
  
  
+ static void create_grouped_var_infos(PlannerInfo *root);
  static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
  						   Index rtindex);
  static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
*************** add_vars_to_targetlist(PlannerInfo *root
*** 240,245 ****
--- 244,533 ----
  	}
  }
  
+ /*
+  * Add GroupedVarInfo to grouped_var_list for each aggregate and setup
+  * GroupedPathInfo for each base relation that can product grouped paths.
+  *
+  * XXX In the future we might want to create GroupedVarInfo for grouping
+  * expressions too, so that grouping key is not limited to plain Var if the
+  * grouping takes place below the top-level join.
+  *
+  * root->group_pathkeys must be setup before this function is called.
+  */
+ extern void
+ add_grouping_info_to_base_rels(PlannerInfo *root)
+ {
+ 	int			i;
+ 
+ 	/* No grouping in the query? */
+ 	if (!root->parse->groupClause || root->group_pathkeys == NIL)
+ 		return;
+ 
+ 	/* TODO This is just for PoC. Relax the limitation later. */
+ 	if (root->parse->havingQual)
+ 		return;
+ 
+ 	/* Create GroupedVarInfo per (distinct) aggregate. */
+ 	create_grouped_var_infos(root);
+ 
+ 	/* Is no grouping is possible below the top-level join? */
+ 	if (root->grouped_var_list == NIL)
+ 		return;
+ 
+ 	/* Process the individual base relations. */
+ 	for (i = 1; i < root->simple_rel_array_size; i++)
+ 	{
+ 		RelOptInfo	*rel = root->simple_rel_array[i];
+ 
+ 		/*
+ 		 * "other rels" will have their targets built later, by translation of
+ 		 * the target of the parent rel - see set_append_rel_size. If we
+ 		 * wanted to prepare the child rels here, we'd need another iteration
+ 		 * of simple_rel_array_size.
+ 		 */
+ 		if (rel != NULL && rel->reloptkind == RELOPT_BASEREL)
+ 			prepare_rel_for_grouping(root, rel);
+ 	}
+ }
+ 
+ /*
+  * Create GroupedVarInfo for each distinct aggregate.
+  *
+  * If any aggregate is not suitable, set root->grouped_var_list to NIL and
+  * return.
+  *
+  * TODO Include aggregates from HAVING clause.
+  */
+ static void
+ create_grouped_var_infos(PlannerInfo *root)
+ {
+ 	List	   *tlist_exprs;
+ 	ListCell	*lc;
+ 
+ 	Assert(root->grouped_var_list == NIL);
+ 
+ 	/*
+ 	 * TODO Check if processed_tlist contains the HAVING aggregates. If not,
+ 	 * get them elsewhere.
+ 	 */
+ 	tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ 								  PVC_INCLUDE_AGGREGATES);
+ 	if (tlist_exprs == NIL)
+ 		return;
+ 
+ 	/* tlist_exprs may also contain Vars, but we only need Aggrefs. */
+ 	foreach(lc, tlist_exprs)
+ 	{
+ 		Expr	*expr = (Expr *) lfirst(lc);
+ 		Aggref	*aggref;
+ 		ListCell	*lc2;
+ 		GroupedVarInfo	*gvi;
+ 		bool	exists;
+ 
+ 		if (IsA(expr, Var))
+ 			continue;
+ 
+ 		aggref = castNode(Aggref, expr);
+ 
+ 		/* TODO Think if (some of) these can be handled. */
+ 		if (aggref->aggvariadic ||
+ 			aggref->aggdirectargs || aggref->aggorder ||
+ 			aggref->aggdistinct || aggref->aggfilter)
+ 		{
+ 			/*
+ 			 * Partial aggregation is not useful if at least one aggregate
+ 			 * cannot be evaluated below the top-level join.
+ 			 *
+ 			 * XXX Is it worth freeing the GroupedVarInfos and their subtrees?
+ 			 */
+ 			root->grouped_var_list = NIL;
+ 			break;
+ 		}
+ 
+ 		/* Does GroupedVarInfo for this aggregate already exist? */
+ 		exists = false;
+ 		foreach(lc2, root->grouped_var_list)
+ 		{
+ 			Expr	*expr = (Expr *) lfirst(lc2);
+ 
+ 			gvi = castNode(GroupedVarInfo, expr);
+ 
+ 			if (equal(expr, gvi->gvexpr))
+ 			{
+ 				exists = true;
+ 				break;
+ 			}
+ 		}
+ 
+ 		/* Construct a new GroupedVarInfo if does not exist yet. */
+ 		if (!exists)
+ 		{
+ 			Relids	relids;
+ 
+ 			/* TODO Initialize gv_width. */
+ 			gvi = makeNode(GroupedVarInfo);
+ 
+ 			gvi->gvid = list_length(root->grouped_var_list);
+ 			gvi->gvexpr = (Expr *) copyObject(aggref);
+ 			gvi->agg_partial = copyObject(aggref);
+ 			mark_partial_aggref(gvi->agg_partial, AGGSPLIT_INITIAL_SERIAL);
+ 
+ 			/* Find out where the aggregate should be evaluated. */
+ 			relids = pull_varnos((Node *) aggref);
+ 			if (!bms_is_empty(relids))
+ 				gvi->gv_eval_at = relids;
+ 			else
+ 			{
+ 				Assert(aggref->aggstar);
+ 				gvi->gv_eval_at = NULL;
+ 			}
+ 
+ 			root->grouped_var_list = lappend(root->grouped_var_list, gvi);
+ 		}
+ 	}
+ 
+ 	list_free(tlist_exprs);
+ }
+ 
+ /*
+  * Check if all the expressions of rel->reltarget can be used as grouping
+  * expressions and create target for grouped paths.
+  *
+  * If we succeed to create the grouping target, also replace rel->reltarget
+  * with a new one that has sortgrouprefs initialized -- this is necessary for
+  * create_agg_plan to match the grouping clauses against the input target
+  * expressions.
+  *
+  * rel_agg_attrs is a set attributes of the relation referenced by aggregate
+  * arguments. These can exist in the (plain) target without being grouping
+  * expressions.
+  *
+  * rel_agg_vars should be passed instead if rel is a join.
+  *
+  * TODO How about PHVs?
+  *
+  * TODO Make sure cost / width of both "result" and "plain" are correct.
+  */
+ PathTarget *
+ create_grouped_target(PlannerInfo *root, RelOptInfo *rel,
+ 					  Relids rel_agg_attrs, List *rel_agg_vars)
+ {
+ 	PathTarget	*result, *plain;
+ 	ListCell	*lc;
+ 
+ 	/* The plan to be returned. */
+ 	result = create_empty_pathtarget();
+ 	/* The one to replace rel->reltarget. */
+ 	plain = create_empty_pathtarget();
+ 
+ 	foreach(lc, rel->reltarget->exprs)
+ 	{
+ 		Expr		*texpr;
+ 		Index		sortgroupref;
+ 		bool		agg_arg_only = false;
+ 
+ 		texpr = (Expr *) lfirst(lc);
+ 
+ 		sortgroupref = get_expr_sortgroupref(root, texpr);
+ 		if (sortgroupref > 0)
+ 		{
+ 			/* It's o.k. to use the target expression for grouping. */
+ 			add_column_to_pathtarget(result, texpr, sortgroupref);
+ 
+ 			/*
+ 			 * As for the plain target, add the original expression but set
+ 			 * sortgroupref in addition.
+ 			 */
+ 			add_column_to_pathtarget(plain, texpr, sortgroupref);
+ 
+ 			/* Process the next expression. */
+ 			continue;
+ 		}
+ 
+ 		/*
+ 		 * It may still be o.k. if the expression is only contained in Aggref
+ 		 * - then it's not expected in the grouped output.
+ 		 *
+ 		 * TODO Try to handle generic expression, not only Var. That might
+ 		 * require us to create rel->reltarget of the grouping rel in
+ 		 * parallel to that of the plain rel, and adding whole expressions
+ 		 * instead of individual vars.
+ 		 */
+ 		if (IsA(texpr, Var))
+ 		{
+ 			Var	*arg_var = castNode(Var, texpr);
+ 
+ 			if (rel->relid > 0)
+ 			{
+ 				AttrNumber	varattno;
+ 
+ 				/*
+ 				 * For a single relation we only need to check attribute
+ 				 * number.
+ 				 *
+ 				 * Apply the same offset that pull_varattnos() did.
+ 				 */
+ 				varattno = arg_var->varattno - FirstLowInvalidHeapAttributeNumber;
+ 
+ 				if (bms_is_member(varattno, rel_agg_attrs))
+ 					agg_arg_only = true;
+ 			}
+ 			else
+ 			{
+ 				ListCell	*lc2;
+ 
+ 				/* Join case. */
+ 				foreach(lc2, rel_agg_vars)
+ 				{
+ 					Var	*var = castNode(Var, lfirst(lc2));
+ 
+ 					if (var->varno == arg_var->varno &&
+ 						var->varattno == arg_var->varattno)
+ 					{
+ 						agg_arg_only = true;
+ 						break;
+ 					}
+ 				}
+ 			}
+ 
+ 			if (agg_arg_only)
+ 			{
+ 				/*
+ 				 * This expression is not suitable for grouping, but the
+ 				 * aggregation input target ought to stay complete.
+ 				 */
+ 				add_column_to_pathtarget(plain, texpr, 0);
+ 			}
+ 		}
+ 
+ 		/*
+ 		 * A single mismatched expression makes the whole relation useless
+ 		 * for grouping.
+ 		 */
+ 		if (!agg_arg_only)
+ 		{
+ 			/*
+ 			 * TODO This seems possible to happen multiple times per relation,
+ 			 * so result might be worth freeing. Implement free_pathtarget()?
+ 			 * Or mark the relation as inappropriate for grouping?
+ 			 */
+ 			/* TODO Free both result and plain. */
+ 			return NULL;
+ 		}
+ 	}
+ 
+ 	if (list_length(result->exprs) == 0)
+ 	{
+ 		/* TODO free_pathtarget(result); free_pathtarget(plain) */
+ 		result = NULL;
+ 	}
+ 
+ 	/* Apply the adjusted input target as the replacement is complete now.q */
+ 	rel->reltarget = plain;
+ 
+ 	return result;
+ }
+ 
  
  /*****************************************************************************
   *
*************** create_lateral_join_info(PlannerInfo *ro
*** 629,639 ****
  	for (rti = 1; rti < root->simple_rel_array_size; rti++)
  	{
  		RelOptInfo *brel = root->simple_rel_array[rti];
  
! 		if (brel == NULL || brel->reloptkind != RELOPT_BASEREL)
  			continue;
  
! 		if (root->simple_rte_array[rti]->inh)
  		{
  			foreach(lc, root->append_rel_list)
  			{
--- 917,941 ----
  	for (rti = 1; rti < root->simple_rel_array_size; rti++)
  	{
  		RelOptInfo *brel = root->simple_rel_array[rti];
+ 		RangeTblEntry *brte = root->simple_rte_array[rti];
  
! 		if (brel == NULL)
  			continue;
  
! 		/*
! 		 * If an "other rel" RTE is a "partitioned table", we must propagate
! 		 * the lateral info inherited all the way from the root parent to its
! 		 * children. That's because the children are not linked directly with
! 		 * the root parent via AppendRelInfo's unlike in case of a regular
! 		 * inheritance set (see expand_inherited_rtentry()).  Failing to
! 		 * do this would result in those children not getting marked with the
! 		 * appropriate lateral info.
! 		 */
! 		if (brel->reloptkind != RELOPT_BASEREL &&
! 			brte->relkind != RELKIND_PARTITIONED_TABLE)
! 			continue;
! 
! 		if (brte->inh)
  		{
  			foreach(lc, root->append_rel_list)
  			{
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
new file mode 100644
index 5565736..058af2c
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** preprocess_minmax_aggregates(PlannerInfo
*** 223,229 ****
  			 create_minmaxagg_path(root, grouped_rel,
  								   create_pathtarget(root, tlist),
  								   aggs_list,
! 								   (List *) parse->havingQual));
  }
  
  /*
--- 223,229 ----
  			 create_minmaxagg_path(root, grouped_rel,
  								   create_pathtarget(root, tlist),
  								   aggs_list,
! 								   (List *) parse->havingQual), false);
  }
  
  /*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
new file mode 100644
index ef0de3f..f70b445
*** a/src/backend/optimizer/plan/planmain.c
--- b/src/backend/optimizer/plan/planmain.c
*************** query_planner(PlannerInfo *root, List *t
*** 83,89 ****
  		add_path(final_rel, (Path *)
  				 create_result_path(root, final_rel,
  									final_rel->reltarget,
! 									(List *) parse->jointree->quals));
  
  		/* Select cheapest path (pretty easy in this case...) */
  		set_cheapest(final_rel);
--- 83,89 ----
  		add_path(final_rel, (Path *)
  				 create_result_path(root, final_rel,
  									final_rel->reltarget,
! 									(List *) parse->jointree->quals), false);
  
  		/* Select cheapest path (pretty easy in this case...) */
  		set_cheapest(final_rel);
*************** query_planner(PlannerInfo *root, List *t
*** 114,119 ****
--- 114,120 ----
  	root->full_join_clauses = NIL;
  	root->join_info_list = NIL;
  	root->placeholder_list = NIL;
+ 	root->grouped_var_list = NIL;
  	root->fkey_list = NIL;
  	root->initial_rels = NIL;
  
*************** query_planner(PlannerInfo *root, List *t
*** 177,182 ****
--- 178,191 ----
  	(*qp_callback) (root, qp_extra);
  
  	/*
+ 	 * If the query result can be grouped, check if any grouping can be
+ 	 * performed below the top-level join. If so, Initialize GroupedPathInfo
+ 	 * of base relations capable to do the grouping and setup
+ 	 * root->grouped_var_list.
+ 	 */
+ 	add_grouping_info_to_base_rels(root);
+ 
+ 	/*
  	 * Examine any "placeholder" expressions generated during subquery pullup.
  	 * Make sure that the Vars they need are marked as needed at the relevant
  	 * join level.  This must be done before join removal because it might
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
new file mode 100644
index 649a233..d47f635
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** typedef struct
*** 108,117 ****
--- 108,135 ----
  	int		   *tleref_to_colnum_map;
  } grouping_sets_data;
  
+ /* Result of a given invocation of inheritance_planner_guts() */
+ typedef struct
+ {
+ 	Index 	nominalRelation;
+ 	List   *partitioned_rels;
+ 	List   *resultRelations;
+ 	List   *subpaths;
+ 	List   *subroots;
+ 	List   *withCheckOptionLists;
+ 	List   *returningLists;
+ 	List   *final_rtable;
+ 	List   *init_plans;
+ 	int		save_rel_array_size;
+ 	RelOptInfo **save_rel_array;
+ } inheritance_planner_result;
+ 
  /* Local functions */
  static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
  static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
  static void inheritance_planner(PlannerInfo *root);
+ static void inheritance_planner_guts(PlannerInfo *root,
+ 						 inheritance_planner_result *inhpres);
  static void grouping_planner(PlannerInfo *root, bool inheritance_update,
  				 double tuple_fraction);
  static grouping_sets_data *preprocess_grouping_sets(PlannerInfo *root);
*************** static void standard_qp_callback(Planner
*** 130,138 ****
  static double get_number_of_groups(PlannerInfo *root,
  					 double path_rows,
  					 grouping_sets_data *gd);
- static Size estimate_hashagg_tablesize(Path *path,
- 						   const AggClauseCosts *agg_costs,
- 						   double dNumGroups);
  static RelOptInfo *create_grouping_paths(PlannerInfo *root,
  					  RelOptInfo *input_rel,
  					  PathTarget *target,
--- 148,153 ----
*************** preprocess_phv_expression(PlannerInfo *r
*** 1020,1044 ****
  static void
  inheritance_planner(PlannerInfo *root)
  {
  	Query	   *parse = root->parse;
  	int			parentRTindex = parse->resultRelation;
  	Bitmapset  *subqueryRTindexes;
  	Bitmapset  *modifiableARIindexes;
! 	int			nominalRelation = -1;
! 	List	   *final_rtable = NIL;
! 	int			save_rel_array_size = 0;
! 	RelOptInfo **save_rel_array = NULL;
! 	List	   *subpaths = NIL;
! 	List	   *subroots = NIL;
! 	List	   *resultRelations = NIL;
! 	List	   *withCheckOptionLists = NIL;
! 	List	   *returningLists = NIL;
! 	List	   *rowMarks;
! 	RelOptInfo *final_rel;
  	ListCell   *lc;
  	Index		rti;
  	RangeTblEntry *parent_rte;
- 	List		  *partitioned_rels = NIL;
  
  	Assert(parse->commandType != CMD_INSERT);
  
--- 1035,1139 ----
  static void
  inheritance_planner(PlannerInfo *root)
  {
+ 	inheritance_planner_result inhpres;
+ 	Query	   *parse = root->parse;
+ 	RelOptInfo *final_rel;
+ 	Index		rti;
+ 	int			final_rtable_len;
+ 	ListCell   *lc;
+ 	List	   *rowMarks;
+ 
+ 	/*
+ 	 * Away we go... Although the inheritance hierarchy to be processed might
+ 	 * be represented in a non-flat manner, some of the elements needed to
+ 	 * create the final ModifyTable path are always returned in a flat list
+ 	 * structure.
+ 	 */
+ 	memset(&inhpres, 0, sizeof(inhpres));
+ 	inheritance_planner_guts(root, &inhpres);
+ 
+ 	/* Result path must go into outer query's FINAL upperrel */
+ 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
+ 
+ 	/*
+ 	 * We don't currently worry about setting final_rel's consider_parallel
+ 	 * flag in this case, nor about allowing FDWs or create_upper_paths_hook
+ 	 * to get control here.
+ 	 */
+ 
+ 	/*
+ 	 * If we managed to exclude every child rel, return a dummy plan; it
+ 	 * doesn't even need a ModifyTable node.
+ 	 */
+ 	if (inhpres.subpaths == NIL)
+ 	{
+ 		set_dummy_rel_pathlist(final_rel);
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * Put back the final adjusted rtable into the master copy of the Query.
+ 	 * (We mustn't do this if we found no non-excluded children.)
+ 	 */
+ 	parse->rtable = inhpres.final_rtable;
+ 	root->simple_rel_array_size = inhpres.save_rel_array_size;
+ 	root->simple_rel_array = inhpres.save_rel_array;
+ 	/* Must reconstruct master's simple_rte_array, too */
+ 	final_rtable_len = list_length(inhpres.final_rtable);
+ 	root->simple_rte_array = (RangeTblEntry **)
+ 								palloc0((final_rtable_len + 1) *
+ 											sizeof(RangeTblEntry *));
+ 	rti = 1;
+ 	foreach(lc, inhpres.final_rtable)
+ 	{
+ 		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
+ 
+ 		root->simple_rte_array[rti++] = rte;
+ 	}
+ 
+ 	/*
+ 	 * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node will
+ 	 * have dealt with fetching non-locked marked rows, else we need to have
+ 	 * ModifyTable do that.
+ 	 */
+ 	if (parse->rowMarks)
+ 		rowMarks = NIL;
+ 	else
+ 		rowMarks = root->rowMarks;
+ 
+ 	/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
+ 	add_path(final_rel, (Path *)
+ 			 create_modifytable_path(root, final_rel,
+ 									 parse->commandType,
+ 									 parse->canSetTag,
+ 									 inhpres.nominalRelation,
+ 									 inhpres.partitioned_rels,
+ 									 inhpres.resultRelations,
+ 									 inhpres.subpaths,
+ 									 inhpres.subroots,
+ 									 inhpres.withCheckOptionLists,
+ 									 inhpres.returningLists,
+ 									 rowMarks,
+ 									 NULL,
+ 									 SS_assign_special_param(root)), false);
+ }
+ 
+ /*
+  * inheritance_planner_guts
+  *	  Recursive guts of inheritance_planner
+  */
+ static void
+ inheritance_planner_guts(PlannerInfo *root,
+ 						 inheritance_planner_result *inhpres)
+ {
  	Query	   *parse = root->parse;
  	int			parentRTindex = parse->resultRelation;
  	Bitmapset  *subqueryRTindexes;
  	Bitmapset  *modifiableARIindexes;
! 	bool		nominalRelationSet = false;
  	ListCell   *lc;
  	Index		rti;
  	RangeTblEntry *parent_rte;
  
  	Assert(parse->commandType != CMD_INSERT);
  
*************** inheritance_planner(PlannerInfo *root)
*** 1106,1112 ****
  	 */
  	parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
! 		nominalRelation = parentRTindex;
  
  	/*
  	 * And now we can get on with generating a plan for each child table.
--- 1201,1210 ----
  	 */
  	parent_rte = rt_fetch(parentRTindex, root->parse->rtable);
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
! 	{
! 		inhpres->nominalRelation = parentRTindex;
! 		nominalRelationSet = true;
! 	}
  
  	/*
  	 * And now we can get on with generating a plan for each child table.
*************** inheritance_planner(PlannerInfo *root)
*** 1115,1120 ****
--- 1213,1219 ----
  	{
  		AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
  		PlannerInfo *subroot;
+ 		Index	childRTindex = appinfo->child_relid;
  		RangeTblEntry *child_rte;
  		RelOptInfo *sub_final_rel;
  		Path	   *subpath;
*************** inheritance_planner(PlannerInfo *root)
*** 1136,1152 ****
  		 * references to the parent RTE to refer to the current child RTE,
  		 * then fool around with subquery RTEs.
  		 */
! 		subroot->parse = (Query *)
! 			adjust_appendrel_attrs(root,
! 								   (Node *) parse,
! 								   appinfo);
  
  		/*
  		 * If there are securityQuals attached to the parent, move them to the
  		 * child rel (they've already been transformed properly for that).
  		 */
  		parent_rte = rt_fetch(parentRTindex, subroot->parse->rtable);
! 		child_rte = rt_fetch(appinfo->child_relid, subroot->parse->rtable);
  		child_rte->securityQuals = parent_rte->securityQuals;
  		parent_rte->securityQuals = NIL;
  
--- 1235,1249 ----
  		 * references to the parent RTE to refer to the current child RTE,
  		 * then fool around with subquery RTEs.
  		 */
! 		subroot->parse = (Query *) adjust_appendrel_attrs(root, (Node *) parse,
! 														  1, &appinfo);
  
  		/*
  		 * If there are securityQuals attached to the parent, move them to the
  		 * child rel (they've already been transformed properly for that).
  		 */
  		parent_rte = rt_fetch(parentRTindex, subroot->parse->rtable);
! 		child_rte = rt_fetch(childRTindex, subroot->parse->rtable);
  		child_rte->securityQuals = parent_rte->securityQuals;
  		parent_rte->securityQuals = NIL;
  
*************** inheritance_planner(PlannerInfo *root)
*** 1191,1197 ****
  		 * These won't be referenced, so there's no need to make them very
  		 * valid-looking.
  		 */
! 		while (list_length(subroot->parse->rtable) < list_length(final_rtable))
  			subroot->parse->rtable = lappend(subroot->parse->rtable,
  											 makeNode(RangeTblEntry));
  
--- 1288,1295 ----
  		 * These won't be referenced, so there's no need to make them very
  		 * valid-looking.
  		 */
! 		while (list_length(subroot->parse->rtable) <
! 										list_length(inhpres->final_rtable))
  			subroot->parse->rtable = lappend(subroot->parse->rtable,
  											 makeNode(RangeTblEntry));
  
*************** inheritance_planner(PlannerInfo *root)
*** 1203,1209 ****
  		 * since subquery RTEs couldn't contain any references to the target
  		 * rel.
  		 */
! 		if (final_rtable != NIL && subqueryRTindexes != NULL)
  		{
  			ListCell   *lr;
  
--- 1301,1307 ----
  		 * since subquery RTEs couldn't contain any references to the target
  		 * rel.
  		 */
! 		if (inhpres->final_rtable != NIL && subqueryRTindexes != NULL)
  		{
  			ListCell   *lr;
  
*************** inheritance_planner(PlannerInfo *root)
*** 1248,1253 ****
--- 1346,1392 ----
  			}
  		}
  
+ 		/*
+ 		 * Recurse for a partitioned child table.  We shouldn't be planning
+ 		 * a partitioned RTE as a child member, which is what the code after
+ 		 * this block does.
+ 		 */
+ 		if (child_rte->inh)
+ 		{
+ 			inheritance_planner_result	child_inhpres;
+ 
+ 			Assert(child_rte->relkind == RELKIND_PARTITIONED_TABLE);
+ 
+ 			/* During the recursive invocation, this child is the parent. */
+ 			subroot->parse->resultRelation = childRTindex;
+ 			memset(&child_inhpres, 0, sizeof(child_inhpres));
+ 			inheritance_planner_guts(subroot, &child_inhpres);
+ 
+ 			inhpres->partitioned_rels = list_concat(inhpres->partitioned_rels,
+ 											child_inhpres.partitioned_rels);
+ 			inhpres->resultRelations = list_concat(inhpres->resultRelations,
+ 											child_inhpres.resultRelations);
+ 			inhpres->subpaths = list_concat(inhpres->subpaths,
+ 											child_inhpres.subpaths);
+ 			inhpres->subroots = list_concat(inhpres->subroots,
+ 											child_inhpres.subroots);
+ 			inhpres->withCheckOptionLists =
+ 									list_concat(inhpres->withCheckOptionLists,
+ 										child_inhpres.withCheckOptionLists);
+ 			inhpres->returningLists = list_concat(inhpres->returningLists,
+ 											child_inhpres.returningLists);
+ 			if (child_inhpres.final_rtable != NIL)
+ 				inhpres->final_rtable = child_inhpres.final_rtable;
+ 			if (child_inhpres.init_plans != NIL)
+ 				inhpres->init_plans = child_inhpres.init_plans;
+ 			if (child_inhpres.save_rel_array_size != 0)
+ 			{
+ 				inhpres->save_rel_array_size = child_inhpres.save_rel_array_size;
+ 				inhpres->save_rel_array = child_inhpres.save_rel_array;
+ 			}
+ 			continue;
+ 		}
+ 
  		/* There shouldn't be any OJ info to translate, as yet */
  		Assert(subroot->join_info_list == NIL);
  		/* and we haven't created PlaceHolderInfos, either */
*************** inheritance_planner(PlannerInfo *root)
*** 1279,1286 ****
  		 * the duplicate child RTE added for the parent does not appear
  		 * anywhere else in the plan tree.
  		 */
! 		if (nominalRelation < 0)
! 			nominalRelation = appinfo->child_relid;
  
  		/*
  		 * Select cheapest path in case there's more than one.  We always run
--- 1418,1428 ----
  		 * the duplicate child RTE added for the parent does not appear
  		 * anywhere else in the plan tree.
  		 */
! 		if (!nominalRelationSet)
! 		{
! 			inhpres->nominalRelation = childRTindex;
! 			nominalRelationSet = true;
! 		}
  
  		/*
  		 * Select cheapest path in case there's more than one.  We always run
*************** inheritance_planner(PlannerInfo *root)
*** 1303,1314 ****
  		 * becomes the initial contents of final_rtable; otherwise, append
  		 * just its modified subquery RTEs to final_rtable.
  		 */
! 		if (final_rtable == NIL)
! 			final_rtable = subroot->parse->rtable;
  		else
! 			final_rtable = list_concat(final_rtable,
! 									   list_copy_tail(subroot->parse->rtable,
! 												 list_length(final_rtable)));
  
  		/*
  		 * We need to collect all the RelOptInfos from all child plans into
--- 1445,1456 ----
  		 * becomes the initial contents of final_rtable; otherwise, append
  		 * just its modified subquery RTEs to final_rtable.
  		 */
! 		if (inhpres->final_rtable == NIL)
! 			inhpres->final_rtable = subroot->parse->rtable;
  		else
! 			inhpres->final_rtable = list_concat(inhpres->final_rtable,
! 										list_copy_tail(subroot->parse->rtable,
! 										 list_length(inhpres->final_rtable)));
  
  		/*
  		 * We need to collect all the RelOptInfos from all child plans into
*************** inheritance_planner(PlannerInfo *root)
*** 1317,1425 ****
  		 * have to propagate forward the RelOptInfos that were already built
  		 * in previous children.
  		 */
! 		Assert(subroot->simple_rel_array_size >= save_rel_array_size);
! 		for (rti = 1; rti < save_rel_array_size; rti++)
  		{
! 			RelOptInfo *brel = save_rel_array[rti];
  
  			if (brel)
  				subroot->simple_rel_array[rti] = brel;
  		}
! 		save_rel_array_size = subroot->simple_rel_array_size;
! 		save_rel_array = subroot->simple_rel_array;
  
  		/* Make sure any initplans from this rel get into the outer list */
! 		root->init_plans = subroot->init_plans;
  
  		/* Build list of sub-paths */
! 		subpaths = lappend(subpaths, subpath);
  
  		/* Build list of modified subroots, too */
! 		subroots = lappend(subroots, subroot);
  
  		/* Build list of target-relation RT indexes */
! 		resultRelations = lappend_int(resultRelations, appinfo->child_relid);
  
  		/* Build lists of per-relation WCO and RETURNING targetlists */
  		if (parse->withCheckOptions)
! 			withCheckOptionLists = lappend(withCheckOptionLists,
! 										   subroot->parse->withCheckOptions);
  		if (parse->returningList)
! 			returningLists = lappend(returningLists,
! 									 subroot->parse->returningList);
! 
  		Assert(!parse->onConflict);
  	}
  
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
  	{
! 		partitioned_rels = get_partitioned_child_rels(root, parentRTindex);
  		/* The root partitioned table is included as a child rel */
! 		Assert(list_length(partitioned_rels) >= 1);
! 	}
! 
! 	/* Result path must go into outer query's FINAL upperrel */
! 	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
! 
! 	/*
! 	 * We don't currently worry about setting final_rel's consider_parallel
! 	 * flag in this case, nor about allowing FDWs or create_upper_paths_hook
! 	 * to get control here.
! 	 */
! 
! 	/*
! 	 * If we managed to exclude every child rel, return a dummy plan; it
! 	 * doesn't even need a ModifyTable node.
! 	 */
! 	if (subpaths == NIL)
! 	{
! 		set_dummy_rel_pathlist(final_rel);
! 		return;
! 	}
! 
! 	/*
! 	 * Put back the final adjusted rtable into the master copy of the Query.
! 	 * (We mustn't do this if we found no non-excluded children.)
! 	 */
! 	parse->rtable = final_rtable;
! 	root->simple_rel_array_size = save_rel_array_size;
! 	root->simple_rel_array = save_rel_array;
! 	/* Must reconstruct master's simple_rte_array, too */
! 	root->simple_rte_array = (RangeTblEntry **)
! 		palloc0((list_length(final_rtable) + 1) * sizeof(RangeTblEntry *));
! 	rti = 1;
! 	foreach(lc, final_rtable)
! 	{
! 		RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
! 
! 		root->simple_rte_array[rti++] = rte;
  	}
- 
- 	/*
- 	 * If there was a FOR [KEY] UPDATE/SHARE clause, the LockRows node will
- 	 * have dealt with fetching non-locked marked rows, else we need to have
- 	 * ModifyTable do that.
- 	 */
- 	if (parse->rowMarks)
- 		rowMarks = NIL;
- 	else
- 		rowMarks = root->rowMarks;
- 
- 	/* Create Path representing a ModifyTable to do the UPDATE/DELETE work */
- 	add_path(final_rel, (Path *)
- 			 create_modifytable_path(root, final_rel,
- 									 parse->commandType,
- 									 parse->canSetTag,
- 									 nominalRelation,
- 									 partitioned_rels,
- 									 resultRelations,
- 									 subpaths,
- 									 subroots,
- 									 withCheckOptionLists,
- 									 returningLists,
- 									 rowMarks,
- 									 NULL,
- 									 SS_assign_special_param(root)));
  }
  
  /*--------------------
--- 1459,1506 ----
  		 * have to propagate forward the RelOptInfos that were already built
  		 * in previous children.
  		 */
! 		Assert(subroot->simple_rel_array_size >= inhpres->save_rel_array_size);
! 		for (rti = 1; rti < inhpres->save_rel_array_size; rti++)
  		{
! 			RelOptInfo *brel = inhpres->save_rel_array[rti];
  
  			if (brel)
  				subroot->simple_rel_array[rti] = brel;
  		}
! 		inhpres->save_rel_array_size = subroot->simple_rel_array_size;
! 		inhpres->save_rel_array = subroot->simple_rel_array;
  
  		/* Make sure any initplans from this rel get into the outer list */
! 		inhpres->init_plans = subroot->init_plans;
  
  		/* Build list of sub-paths */
! 		inhpres->subpaths = lappend(inhpres->subpaths, subpath);
  
  		/* Build list of modified subroots, too */
! 		inhpres->subroots = lappend(inhpres->subroots, subroot);
  
  		/* Build list of target-relation RT indexes */
! 		inhpres->resultRelations = lappend_int(inhpres->resultRelations,
! 											   childRTindex);
  
  		/* Build lists of per-relation WCO and RETURNING targetlists */
  		if (parse->withCheckOptions)
! 			inhpres->withCheckOptionLists =
! 										lappend(inhpres->withCheckOptionLists,
! 											subroot->parse->withCheckOptions);
  		if (parse->returningList)
! 			inhpres->returningLists = lappend(inhpres->returningLists,
! 											  subroot->parse->returningList);
  		Assert(!parse->onConflict);
  	}
  
  	if (parent_rte->relkind == RELKIND_PARTITIONED_TABLE)
  	{
! 		inhpres->partitioned_rels = get_partitioned_child_rels(root,
! 															parentRTindex);
  		/* The root partitioned table is included as a child rel */
! 		Assert(list_length(inhpres->partitioned_rels) >= 1);
  	}
  }
  
  /*--------------------
*************** grouping_planner(PlannerInfo *root, bool
*** 2040,2046 ****
  		}
  
  		/* And shove it into final_rel */
! 		add_path(final_rel, path);
  	}
  
  	/*
--- 2121,2127 ----
  		}
  
  		/* And shove it into final_rel */
! 		add_path(final_rel, path, false);
  	}
  
  	/*
*************** get_number_of_groups(PlannerInfo *root,
*** 3446,3485 ****
  }
  
  /*
-  * estimate_hashagg_tablesize
-  *	  estimate the number of bytes that a hash aggregate hashtable will
-  *	  require based on the agg_costs, path width and dNumGroups.
-  *
-  * XXX this may be over-estimating the size now that hashagg knows to omit
-  * unneeded columns from the hashtable. Also for mixed-mode grouping sets,
-  * grouping columns not in the hashed set are counted here even though hashagg
-  * won't store them. Is this a problem?
-  */
- static Size
- estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
- 						   double dNumGroups)
- {
- 	Size		hashentrysize;
- 
- 	/* Estimate per-hash-entry space at tuple width... */
- 	hashentrysize = MAXALIGN(path->pathtarget->width) +
- 		MAXALIGN(SizeofMinimalTupleHeader);
- 
- 	/* plus space for pass-by-ref transition values... */
- 	hashentrysize += agg_costs->transitionSpace;
- 	/* plus the per-hash-entry overhead */
- 	hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
- 
- 	/*
- 	 * Note that this disregards the effect of fill-factor and growth policy
- 	 * of the hash-table. That's probably ok, given default the default
- 	 * fill-factor is relatively high. It'd be hard to meaningfully factor in
- 	 * "double-in-size" growth policies here.
- 	 */
- 	return hashentrysize * dNumGroups;
- }
- 
- /*
   * create_grouping_paths
   *
   * Build a new upperrel containing Paths for grouping and/or aggregation.
--- 3527,3532 ----
*************** create_grouping_paths(PlannerInfo *root,
*** 3600,3606 ****
  								   (List *) parse->havingQual);
  		}
  
! 		add_path(grouped_rel, path);
  
  		/* No need to consider any other alternatives. */
  		set_cheapest(grouped_rel);
--- 3647,3653 ----
  								   (List *) parse->havingQual);
  		}
  
! 		add_path(grouped_rel, path, false);
  
  		/* No need to consider any other alternatives. */
  		set_cheapest(grouped_rel);
*************** create_grouping_paths(PlannerInfo *root,
*** 3777,3783 ****
  														 parse->groupClause,
  														 NIL,
  														 &agg_partial_costs,
! 														 dNumPartialGroups));
  					else
  						add_partial_path(grouped_rel, (Path *)
  										 create_group_path(root,
--- 3824,3831 ----
  														 parse->groupClause,
  														 NIL,
  														 &agg_partial_costs,
! 														 dNumPartialGroups),
! 							false);
  					else
  						add_partial_path(grouped_rel, (Path *)
  										 create_group_path(root,
*************** create_grouping_paths(PlannerInfo *root,
*** 3786,3792 ****
  													 partial_grouping_target,
  														   parse->groupClause,
  														   NIL,
! 														 dNumPartialGroups));
  				}
  			}
  		}
--- 3834,3841 ----
  													 partial_grouping_target,
  														   parse->groupClause,
  														   NIL,
! 														   dNumPartialGroups),
! 										 false);
  				}
  			}
  		}
*************** create_grouping_paths(PlannerInfo *root,
*** 3817,3823 ****
  												 parse->groupClause,
  												 NIL,
  												 &agg_partial_costs,
! 												 dNumPartialGroups));
  			}
  		}
  	}
--- 3866,3873 ----
  												 parse->groupClause,
  												 NIL,
  												 &agg_partial_costs,
! 												 dNumPartialGroups),
! 								 false);
  			}
  		}
  	}
*************** create_grouping_paths(PlannerInfo *root,
*** 3869,3875 ****
  											 parse->groupClause,
  											 (List *) parse->havingQual,
  											 agg_costs,
! 											 dNumGroups));
  				}
  				else if (parse->groupClause)
  				{
--- 3919,3925 ----
  											 parse->groupClause,
  											 (List *) parse->havingQual,
  											 agg_costs,
! 											 dNumGroups), false);
  				}
  				else if (parse->groupClause)
  				{
*************** create_grouping_paths(PlannerInfo *root,
*** 3884,3890 ****
  											   target,
  											   parse->groupClause,
  											   (List *) parse->havingQual,
! 											   dNumGroups));
  				}
  				else
  				{
--- 3934,3940 ----
  											   target,
  											   parse->groupClause,
  											   (List *) parse->havingQual,
! 											   dNumGroups), false);
  				}
  				else
  				{
*************** create_grouping_paths(PlannerInfo *root,
*** 3933,3939 ****
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups));
  			else
  				add_path(grouped_rel, (Path *)
  						 create_group_path(root,
--- 3983,3989 ----
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups), false);
  			else
  				add_path(grouped_rel, (Path *)
  						 create_group_path(root,
*************** create_grouping_paths(PlannerInfo *root,
*** 3942,3948 ****
  										   target,
  										   parse->groupClause,
  										   (List *) parse->havingQual,
! 										   dNumGroups));
  
  			/*
  			 * The point of using Gather Merge rather than Gather is that it
--- 3992,3998 ----
  										   target,
  										   parse->groupClause,
  										   (List *) parse->havingQual,
! 										   dNumGroups), false);
  
  			/*
  			 * The point of using Gather Merge rather than Gather is that it
*************** create_grouping_paths(PlannerInfo *root,
*** 3995,4001 ****
  												 parse->groupClause,
  												 (List *) parse->havingQual,
  												 &agg_final_costs,
! 												 dNumGroups));
  					else
  						add_path(grouped_rel, (Path *)
  								 create_group_path(root,
--- 4045,4051 ----
  												 parse->groupClause,
  												 (List *) parse->havingQual,
  												 &agg_final_costs,
! 												 dNumGroups), false);
  					else
  						add_path(grouped_rel, (Path *)
  								 create_group_path(root,
*************** create_grouping_paths(PlannerInfo *root,
*** 4004,4010 ****
  												   target,
  												   parse->groupClause,
  												   (List *) parse->havingQual,
! 												   dNumGroups));
  				}
  			}
  		}
--- 4054,4060 ----
  												   target,
  												   parse->groupClause,
  												   (List *) parse->havingQual,
! 												   dNumGroups), false);
  				}
  			}
  		}
*************** create_grouping_paths(PlannerInfo *root,
*** 4049,4055 ****
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 agg_costs,
! 										 dNumGroups));
  			}
  		}
  
--- 4099,4105 ----
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 agg_costs,
! 										 dNumGroups), false);
  			}
  		}
  
*************** create_grouping_paths(PlannerInfo *root,
*** 4087,4095 ****
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups));
  			}
  		}
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
--- 4137,4212 ----
  										 parse->groupClause,
  										 (List *) parse->havingQual,
  										 &agg_final_costs,
! 										 dNumGroups), false);
  			}
  		}
+ 
+ 		/*
+ 		 * If input_rel has partially aggregated partial paths, gather them
+ 		 * and perform the final aggregation.
+ 		 *
+ 		 * TODO Allow havingQual - currently not supported at base relation
+ 		 * level.
+ 		 */
+ 		if (input_rel->gpi != NULL &&
+ 			input_rel->gpi->partial_pathlist != NIL &&
+ 			!parse->havingQual)
+ 		{
+ 			Path	   *path = (Path *) linitial(input_rel->gpi->partial_pathlist);
+ 			double		total_groups = path->rows * path->parallel_workers;
+ 
+ 			path = (Path *) create_gather_path(root,
+ 											   input_rel,
+ 											   path,
+ 											   path->pathtarget,
+ 											   NULL,
+ 											   &total_groups);
+ 
+ 			/*
+ 			 * The input path is partially aggregated and the final
+ 			 * aggregation - if the path wins - will be done below. So we're
+ 			 * done with it for now.
+ 			 *
+ 			 * The top-level grouped_rel needs to receive the path into
+ 			 * regular pathlist, as opposed grouped_rel->gpi->pathlist.
+ 			 */
+ 			add_path(input_rel, path, false);
+ 		}
+ 
+ 		/*
+ 		 * If input_rel has partially aggregated paths, perform the final
+ 		 * aggregation.
+ 		 *
+ 		 * TODO Allow havingQual - currently not supported at base relation
+ 		 * level.
+ 		 */
+ 		if (input_rel->gpi != NULL && input_rel->gpi->pathlist != NIL &&
+ 			!parse->havingQual)
+ 		{
+ 			Path *pre_agg = (Path *) linitial(input_rel->gpi->pathlist);
+ 
+ 			dNumGroups = get_number_of_groups(root, pre_agg->rows, gd);
+ 
+ 			MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
+ 			get_agg_clause_costs(root, (Node *) target->exprs,
+ 								 AGGSPLIT_FINAL_DESERIAL,
+ 								 &agg_final_costs);
+ 			get_agg_clause_costs(root, parse->havingQual,
+ 								 AGGSPLIT_FINAL_DESERIAL,
+ 								 &agg_final_costs);
+ 
+ 			add_path(grouped_rel,
+ 					 (Path *) create_agg_path(root, grouped_rel,
+ 											  pre_agg,
+ 											  target,
+ 											  AGG_HASHED,
+ 											  AGGSPLIT_FINAL_DESERIAL,
+ 											  parse->groupClause,
+ 											  (List *) parse->havingQual,
+ 											  &agg_final_costs,
+ 											  dNumGroups),
+ 					 false);
+ 		}
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
*************** consider_groupingsets_paths(PlannerInfo
*** 4289,4295 ****
  										  strat,
  										  new_rollups,
  										  agg_costs,
! 										  dNumGroups));
  		return;
  	}
  
--- 4406,4412 ----
  										  strat,
  										  new_rollups,
  										  agg_costs,
! 										  dNumGroups), false);
  		return;
  	}
  
*************** consider_groupingsets_paths(PlannerInfo
*** 4447,4453 ****
  											  AGG_MIXED,
  											  rollups,
  											  agg_costs,
! 											  dNumGroups));
  		}
  	}
  
--- 4564,4570 ----
  											  AGG_MIXED,
  											  rollups,
  											  agg_costs,
! 											  dNumGroups), false);
  		}
  	}
  
*************** consider_groupingsets_paths(PlannerInfo
*** 4464,4470 ****
  										  AGG_SORTED,
  										  gd->rollups,
  										  agg_costs,
! 										  dNumGroups));
  }
  
  /*
--- 4581,4587 ----
  										  AGG_SORTED,
  										  gd->rollups,
  										  agg_costs,
! 										  dNumGroups), false);
  }
  
  /*
*************** create_one_window_path(PlannerInfo *root
*** 4649,4655 ****
  								  window_pathkeys);
  	}
  
! 	add_path(window_rel, path);
  }
  
  /*
--- 4766,4772 ----
  								  window_pathkeys);
  	}
  
! 	add_path(window_rel, path, false);
  }
  
  /*
*************** create_distinct_paths(PlannerInfo *root,
*** 4755,4761 ****
  						 create_upper_unique_path(root, distinct_rel,
  												  path,
  										list_length(root->distinct_pathkeys),
! 												  numDistinctRows));
  			}
  		}
  
--- 4872,4878 ----
  						 create_upper_unique_path(root, distinct_rel,
  												  path,
  										list_length(root->distinct_pathkeys),
! 												  numDistinctRows), false);
  			}
  		}
  
*************** create_distinct_paths(PlannerInfo *root,
*** 4782,4788 ****
  				 create_upper_unique_path(root, distinct_rel,
  										  path,
  										list_length(root->distinct_pathkeys),
! 										  numDistinctRows));
  	}
  
  	/*
--- 4899,4905 ----
  				 create_upper_unique_path(root, distinct_rel,
  										  path,
  										list_length(root->distinct_pathkeys),
! 										  numDistinctRows), false);
  	}
  
  	/*
*************** create_distinct_paths(PlannerInfo *root,
*** 4829,4835 ****
  								 parse->distinctClause,
  								 NIL,
  								 NULL,
! 								 numDistinctRows));
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
--- 4946,4952 ----
  								 parse->distinctClause,
  								 NIL,
  								 NULL,
! 								 numDistinctRows), false);
  	}
  
  	/* Give a helpful error if we failed to find any implementation */
*************** create_ordered_paths(PlannerInfo *root,
*** 4927,4933 ****
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path);
  		}
  	}
  
--- 5044,5050 ----
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path, false);
  		}
  	}
  
*************** create_ordered_paths(PlannerInfo *root,
*** 4977,4983 ****
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path);
  		}
  	}
  
--- 5094,5100 ----
  				path = apply_projection_to_path(root, ordered_rel,
  												path, target);
  
! 			add_path(ordered_rel, path, false);
  		}
  	}
  
*************** get_partitioned_child_rels(PlannerInfo *
*** 6083,6085 ****
--- 6200,6230 ----
  
  	return result;
  }
+ 
+ /*
+  * get_partitioned_child_rels_for_join
+  *		Build and return a list containing the RTI of every partitioned
+  *		relation which is a child of some rel included in the join.
+  *
+  * Note: Only call this function on joins between partitioned tables.
+  */
+ List *
+ get_partitioned_child_rels_for_join(PlannerInfo *root,
+ 									RelOptInfo *joinrel)
+ {
+ 	List	   *result = NIL;
+ 	ListCell   *l;
+ 
+ 	foreach(l, root->pcinfo_list)
+ 	{
+ 		PartitionedChildRelInfo	*pc = lfirst(l);
+ 
+ 		if (bms_is_member(pc->parent_relid, joinrel->relids))
+ 			result = list_concat(result, list_copy(pc->child_rels));
+ 	}
+ 
+ 	/* The root partitioned table is included as a child rel */
+ 	Assert(list_length(result) >= bms_num_members(joinrel->relids));
+ 
+ 	return result;
+ }
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
new file mode 100644
index 1278371..44c3919
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** typedef struct
*** 40,46 ****
--- 40,50 ----
  	List	   *tlist;			/* underlying target list */
  	int			num_vars;		/* number of plain Var tlist entries */
  	bool		has_ph_vars;	/* are there PlaceHolderVar entries? */
+ 	bool		has_grp_vars;	/* are there GroupedVar entries? */
  	bool		has_non_vars;	/* are there other entries? */
+ 	bool		has_conv_whole_rows;	/* are there ConvertRowtypeExpr entries
+ 										 * encapsulating a whole-row Var?
+ 										 */
  	tlist_vinfo vars[FLEXIBLE_ARRAY_MEMBER];	/* has num_vars entries */
  } indexed_tlist;
  
*************** static List *set_returning_clause_refere
*** 139,144 ****
--- 143,149 ----
  								int rtoffset);
  static bool extract_query_dependencies_walker(Node *node,
  								  PlannerInfo *context);
+ static Var *get_wholerow_ref_from_convert_row_type(Node *node);
  
  /*****************************************************************************
   *
*************** set_upper_references(PlannerInfo *root,
*** 1725,1733 ****
--- 1730,1781 ----
  	indexed_tlist *subplan_itlist;
  	List	   *output_targetlist;
  	ListCell   *l;
+ 	List	*sub_tlist_save = NIL;
+ 
+ 	if (root->grouped_var_list != NIL)
+ 	{
+ 		if (IsA(plan, Agg))
+ 		{
+ 			Agg	*agg = (Agg *) plan;
+ 
+ 			if (agg->aggsplit == AGGSPLIT_FINAL_DESERIAL)
+ 			{
+ 				/*
+ 				 * convert_combining_aggrefs could have replaced some vars
+ 				 * with Aggref expressions representing the partial
+ 				 * aggregation. We need to restore the same Aggrefs in the
+ 				 * subplan targetlist, but this would break the subplan if
+ 				 * it's something else than the partial aggregation (i.e. the
+ 				 * partial aggregation takes place lower in the plan tree). So
+ 				 * we'll eventually need to restore the original list.
+ 				 */
+ 				if (!IsA(subplan, Agg))
+ 					sub_tlist_save = subplan->targetlist;
+ #ifdef USE_ASSERT_CHECKING
+ 				else
+ 					Assert(((Agg *) subplan)->aggsplit == AGGSPLIT_INITIAL_SERIAL);
+ #endif	/* USE_ASSERT_CHECKING */
+ 
+ 				/*
+ 				 * Restore the aggregate expressions that we might have
+ 				 * removed when planning for aggregation at base relation
+ 				 * level.
+ 				 */
+ 				subplan->targetlist =
+ 					restore_grouping_expressions(root, subplan->targetlist);
+ 			}
+ 		}
+ 	}
  
  	subplan_itlist = build_tlist_index(subplan->targetlist);
  
+ 	/*
+ 	 * The replacement of GroupVars by Aggrefs was only needed for the index
+ 	 * build.
+ 	 */
+ 	if (sub_tlist_save != NIL)
+ 		subplan->targetlist = sub_tlist_save;
+ 
  	output_targetlist = NIL;
  	foreach(l, plan->targetlist)
  	{
*************** build_tlist_index(List *tlist)
*** 1937,1943 ****
--- 1985,1993 ----
  
  	itlist->tlist = tlist;
  	itlist->has_ph_vars = false;
+ 	itlist->has_grp_vars = false;
  	itlist->has_non_vars = false;
+ 	itlist->has_conv_whole_rows = false;
  
  	/* Find the Vars and fill in the index array */
  	vinfo = itlist->vars;
*************** build_tlist_index(List *tlist)
*** 1956,1961 ****
--- 2006,2015 ----
  		}
  		else if (tle->expr && IsA(tle->expr, PlaceHolderVar))
  			itlist->has_ph_vars = true;
+ 		else if (tle->expr && IsA(tle->expr, GroupedVar))
+ 			itlist->has_grp_vars = true;
+ 		else if (get_wholerow_ref_from_convert_row_type((Node *) tle->expr))
+ 			itlist->has_conv_whole_rows = true;
  		else
  			itlist->has_non_vars = true;
  	}
*************** build_tlist_index(List *tlist)
*** 1971,1977 ****
   * This is like build_tlist_index, but we only index tlist entries that
   * are Vars belonging to some rel other than the one specified.  We will set
   * has_ph_vars (allowing PlaceHolderVars to be matched), but not has_non_vars
!  * (so nothing other than Vars and PlaceHolderVars can be matched).
   */
  static indexed_tlist *
  build_tlist_index_other_vars(List *tlist, Index ignore_rel)
--- 2025,2034 ----
   * This is like build_tlist_index, but we only index tlist entries that
   * are Vars belonging to some rel other than the one specified.  We will set
   * has_ph_vars (allowing PlaceHolderVars to be matched), but not has_non_vars
!  * (so nothing other than Vars and PlaceHolderVars can be matched). In case of
!  * DML, where this function will be used, returning lists from child relations
!  * will be appended similar to a simple append relation. That does not require
!  * fixing ConvertRowtypeExpr references. So, those are not considered here.
   */
  static indexed_tlist *
  build_tlist_index_other_vars(List *tlist, Index ignore_rel)
*************** build_tlist_index_other_vars(List *tlist
*** 1988,1993 ****
--- 2045,2051 ----
  	itlist->tlist = tlist;
  	itlist->has_ph_vars = false;
  	itlist->has_non_vars = false;
+ 	itlist->has_conv_whole_rows = false;
  
  	/* Find the desired Vars and fill in the index array */
  	vinfo = itlist->vars;
*************** fix_join_expr_mutator(Node *node, fix_jo
*** 2233,2238 ****
--- 2291,2321 ----
  		/* No referent found for Var */
  		elog(ERROR, "variable not found in subplan target lists");
  	}
+ 	if (IsA(node, GroupedVar))
+ 	{
+ 		GroupedVar *gvar = (GroupedVar *) node;
+ 
+ 		/* See if the GroupedVar has bubbled up from a lower plan node */
+ 		if (context->outer_itlist && context->outer_itlist->has_grp_vars)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) gvar,
+ 													  context->outer_itlist,
+ 													  OUTER_VAR);
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 		if (context->inner_itlist && context->inner_itlist->has_grp_vars)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) gvar,
+ 													  context->inner_itlist,
+ 													  INNER_VAR);
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 
+ 		/* No referent found for GroupedVar */
+ 		elog(ERROR, "grouped variable not found in subplan target lists");
+ 	}
  	if (IsA(node, PlaceHolderVar))
  	{
  		PlaceHolderVar *phv = (PlaceHolderVar *) node;
*************** fix_join_expr_mutator(Node *node, fix_jo
*** 2258,2263 ****
--- 2341,2369 ----
  		/* If not supplied by input plans, evaluate the contained expr */
  		return fix_join_expr_mutator((Node *) phv->phexpr, context);
  	}
+ 	if (get_wholerow_ref_from_convert_row_type(node))
+ 	{
+ 		if (context->outer_itlist &&
+ 			context->outer_itlist->has_conv_whole_rows)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) node,
+ 													 context->outer_itlist,
+ 																OUTER_VAR);
+ 
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 		if (context->inner_itlist &&
+ 			context->inner_itlist->has_conv_whole_rows)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) node,
+ 													 context->inner_itlist,
+ 																INNER_VAR);
+ 
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 	}
  	if (IsA(node, Param))
  		return fix_param_node(context->root, (Param *) node);
  	/* Try matching more complex expressions too, if tlists have any */
*************** fix_upper_expr_mutator(Node *node, fix_u
*** 2364,2369 ****
--- 2470,2486 ----
  		/* If not supplied by input plan, evaluate the contained expr */
  		return fix_upper_expr_mutator((Node *) phv->phexpr, context);
  	}
+ 	if (get_wholerow_ref_from_convert_row_type(node))
+ 	{
+ 		if (context->subplan_itlist->has_conv_whole_rows)
+ 		{
+ 			newvar = search_indexed_tlist_for_non_var((Expr *) node,
+ 													  context->subplan_itlist,
+ 													  context->newvarno);
+ 			if (newvar)
+ 				return (Node *) newvar;
+ 		}
+ 	}
  	if (IsA(node, Param))
  		return fix_param_node(context->root, (Param *) node);
  	if (IsA(node, Aggref))
*************** fix_upper_expr_mutator(Node *node, fix_u
*** 2389,2395 ****
  		/* If no match, just fall through to process it normally */
  	}
  	/* Try matching more complex expressions too, if tlist has any */
! 	if (context->subplan_itlist->has_non_vars)
  	{
  		newvar = search_indexed_tlist_for_non_var((Expr *) node,
  												  context->subplan_itlist,
--- 2506,2513 ----
  		/* If no match, just fall through to process it normally */
  	}
  	/* Try matching more complex expressions too, if tlist has any */
! 	if (context->subplan_itlist->has_grp_vars ||
! 		context->subplan_itlist->has_non_vars)
  	{
  		newvar = search_indexed_tlist_for_non_var((Expr *) node,
  												  context->subplan_itlist,
*************** extract_query_dependencies_walker(Node *
*** 2596,2598 ****
--- 2714,2748 ----
  	return expression_tree_walker(node, extract_query_dependencies_walker,
  								  (void *) context);
  }
+ 
+ /*
+  * get_wholerow_ref_from_convert_row_type
+  *		Given a node, check if it's a ConvertRowtypeExpr encapsulating a
+  *		whole-row reference as implicit cast and return the whole-row
+  *		reference Var if so. Otherwise return NULL. In case of multi-level
+  *		partitioning, we will have as many nested ConvertRowtypeExpr as there
+  *		are levels in partition hierarchy.
+  */
+ static Var *
+ get_wholerow_ref_from_convert_row_type(Node *node)
+ {
+ 	Var		   *var = NULL;
+ 	ConvertRowtypeExpr *convexpr;
+ 
+ 	if (!node || !IsA(node, ConvertRowtypeExpr))
+ 		return NULL;
+ 
+ 	/* Traverse nested ConvertRowtypeExpr's. */
+ 	convexpr = castNode(ConvertRowtypeExpr, node);
+ 	while (convexpr->convertformat == COERCE_IMPLICIT_CAST &&
+ 		   IsA(convexpr->arg, ConvertRowtypeExpr))
+ 		convexpr = (ConvertRowtypeExpr *) convexpr->arg;
+ 
+ 	if (IsA(convexpr->arg, Var))
+ 		var = castNode(Var, convexpr->arg);
+ 
+ 	if (var && var->varattno == 0)
+ 		return var;
+ 
+ 	return NULL;
+ }
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
new file mode 100644
index a1be858..8bdaa44
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
***************
*** 55,61 ****
  typedef struct
  {
  	PlannerInfo *root;
! 	AppendRelInfo *appinfo;
  } adjust_appendrel_attrs_context;
  
  static Path *recurse_set_operations(Node *setOp, PlannerInfo *root,
--- 55,62 ----
  typedef struct
  {
  	PlannerInfo *root;
! 	int		nappinfos;
! 	AppendRelInfo **appinfos;
  } adjust_appendrel_attrs_context;
  
  static Path *recurse_set_operations(Node *setOp, PlannerInfo *root,
*************** static List *generate_append_tlist(List
*** 97,103 ****
  					  List *input_tlists,
  					  List *refnames_tlist);
  static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
! static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
  						 Index rti);
  static void make_inh_translation_list(Relation oldrelation,
  						  Relation newrelation,
--- 98,104 ----
  					  List *input_tlists,
  					  List *refnames_tlist);
  static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
! static List *expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
  						 Index rti);
  static void make_inh_translation_list(Relation oldrelation,
  						  Relation newrelation,
*************** static Bitmapset *translate_col_privs(co
*** 107,113 ****
  					List *translated_vars);
  static Node *adjust_appendrel_attrs_mutator(Node *node,
  							   adjust_appendrel_attrs_context *context);
- static Relids adjust_relid_set(Relids relids, Index oldrelid, Index newrelid);
  static List *adjust_inherited_tlist(List *tlist,
  					   AppendRelInfo *context);
  
--- 108,113 ----
*************** plan_set_operations(PlannerInfo *root)
*** 207,213 ****
  	root->processed_tlist = top_tlist;
  
  	/* Add only the final path to the SETOP upperrel. */
! 	add_path(setop_rel, path);
  
  	/* Let extensions possibly add some more paths */
  	if (create_upper_paths_hook)
--- 207,213 ----
  	root->processed_tlist = top_tlist;
  
  	/* Add only the final path to the SETOP upperrel. */
! 	add_path(setop_rel, path, false);
  
  	/* Let extensions possibly add some more paths */
  	if (create_upper_paths_hook)
*************** expand_inherited_tables(PlannerInfo *roo
*** 1330,1348 ****
  	Index		nrtes;
  	Index		rti;
  	ListCell   *rl;
  
  	/*
  	 * expand_inherited_rtentry may add RTEs to parse->rtable; there is no
  	 * need to scan them since they can't have inh=true.  So just scan as far
  	 * as the original end of the rtable list.
  	 */
! 	nrtes = list_length(root->parse->rtable);
! 	rl = list_head(root->parse->rtable);
  	for (rti = 1; rti <= nrtes; rti++)
  	{
  		RangeTblEntry *rte = (RangeTblEntry *) lfirst(rl);
  
! 		expand_inherited_rtentry(root, rte, rti);
  		rl = lnext(rl);
  	}
  }
--- 1330,1351 ----
  	Index		nrtes;
  	Index		rti;
  	ListCell   *rl;
+ 	Query	   *parse = root->parse;
  
  	/*
  	 * expand_inherited_rtentry may add RTEs to parse->rtable; there is no
  	 * need to scan them since they can't have inh=true.  So just scan as far
  	 * as the original end of the rtable list.
  	 */
! 	nrtes = list_length(parse->rtable);
! 	rl = list_head(parse->rtable);
  	for (rti = 1; rti <= nrtes; rti++)
  	{
  		RangeTblEntry *rte = (RangeTblEntry *) lfirst(rl);
+ 		List		  *appinfos;
  
! 		appinfos = expand_inherited_rtentry(root, rte, rti);
! 		root->append_rel_list = list_concat(root->append_rel_list, appinfos);
  		rl = lnext(rl);
  	}
  }
*************** expand_inherited_tables(PlannerInfo *roo
*** 1362,1369 ****
   *
   * A childless table is never considered to be an inheritance set; therefore
   * a parent RTE must always have at least two associated AppendRelInfos.
   */
! static void
  expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
  {
  	Query	   *parse = root->parse;
--- 1365,1374 ----
   *
   * A childless table is never considered to be an inheritance set; therefore
   * a parent RTE must always have at least two associated AppendRelInfos.
+  *
+  * Returns a list of AppendRelInfos, or NIL.
   */
! static List*
  expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
  {
  	Query	   *parse = root->parse;
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1380,1391 ****
  
  	/* Does RT entry allow inheritance? */
  	if (!rte->inh)
! 		return;
  	/* Ignore any already-expanded UNION ALL nodes */
  	if (rte->rtekind != RTE_RELATION)
  	{
  		Assert(rte->rtekind == RTE_SUBQUERY);
! 		return;
  	}
  	/* Fast path for common case of childless table */
  	parentOID = rte->relid;
--- 1385,1396 ----
  
  	/* Does RT entry allow inheritance? */
  	if (!rte->inh)
! 		return NIL;
  	/* Ignore any already-expanded UNION ALL nodes */
  	if (rte->rtekind != RTE_RELATION)
  	{
  		Assert(rte->rtekind == RTE_SUBQUERY);
! 		return NIL;
  	}
  	/* Fast path for common case of childless table */
  	parentOID = rte->relid;
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1393,1399 ****
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return;
  	}
  
  	/*
--- 1398,1404 ----
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return NIL;
  	}
  
  	/*
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1417,1424 ****
  	else
  		lockmode = AccessShareLock;
  
! 	/* Scan for all members of inheritance set, acquire needed locks */
! 	inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
  
  	/*
  	 * Check that there's at least one descendant, else treat as no-child
--- 1422,1440 ----
  	else
  		lockmode = AccessShareLock;
  
! 	/*
! 	 * Expand partitioned table level-wise to help optimizations like
! 	 * partition-wise join which match partitions at every level. Otherwise,
! 	 * scan for all members of inheritance set. Acquire needed locks
! 	 */
! 	if (rte->relkind == RELKIND_PARTITIONED_TABLE)
! 	{
! 		inhOIDs = list_make1_oid(parentOID);
! 		inhOIDs = list_concat(inhOIDs,
! 							  find_inheritance_children(parentOID, lockmode));
! 	}
! 	else
! 		inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
  
  	/*
  	 * Check that there's at least one descendant, else treat as no-child
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1429,1435 ****
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return;
  	}
  
  	/*
--- 1445,1451 ----
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return NIL;
  	}
  
  	/*
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1457,1462 ****
--- 1473,1484 ----
  		Index		childRTindex;
  		AppendRelInfo *appinfo;
  
+ 		/*
+ 		 * If this child is a partitioned table, this contains AppendRelInfos
+ 		 * for its own children.
+ 		 */
+ 		List		  *myappinfos;
+ 
  		/* Open rel if needed; we already have required locks */
  		if (childOID != parentOID)
  			newrelation = heap_open(childOID, NoLock);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1490,1496 ****
  		childrte = copyObject(rte);
  		childrte->relid = childOID;
  		childrte->relkind = newrelation->rd_rel->relkind;
! 		childrte->inh = false;
  		childrte->requiredPerms = 0;
  		childrte->securityQuals = NIL;
  		parse->rtable = lappend(parse->rtable, childrte);
--- 1512,1523 ----
  		childrte = copyObject(rte);
  		childrte->relid = childOID;
  		childrte->relkind = newrelation->rd_rel->relkind;
! 		/* A partitioned child will need to be expanded further. */
! 		if (childOID != parentOID &&
! 			childrte->relkind == RELKIND_PARTITIONED_TABLE)
! 			childrte->inh = true;
! 		else
! 			childrte->inh = false;
  		childrte->requiredPerms = 0;
  		childrte->securityQuals = NIL;
  		parse->rtable = lappend(parse->rtable, childrte);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1498,1506 ****
  
  		/*
  		 * Build an AppendRelInfo for this parent and child, unless the child
! 		 * is a partitioned table.
  		 */
! 		if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
  		{
  			need_append = true;
  			appinfo = makeNode(AppendRelInfo);
--- 1525,1533 ----
  
  		/*
  		 * Build an AppendRelInfo for this parent and child, unless the child
! 		 * RTE simply duplicates the parent *partitioned* table.
  		 */
! 		if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh)
  		{
  			need_append = true;
  			appinfo = makeNode(AppendRelInfo);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1570,1575 ****
--- 1597,1610 ----
  		/* Close child relations, but keep locks */
  		if (childOID != parentOID)
  			heap_close(newrelation, NoLock);
+ 
+ 		/* Expand partitioned children recursively. */
+ 		if (childrte->inh)
+ 		{
+ 			myappinfos = expand_inherited_rtentry(root, childrte,
+ 												  childRTindex);
+ 			appinfos = list_concat(appinfos, myappinfos);
+ 		}
  	}
  
  	heap_close(oldrelation, NoLock);
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1585,1591 ****
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return;
  	}
  
  	/*
--- 1620,1626 ----
  	{
  		/* Clear flag before returning */
  		rte->inh = false;
! 		return NIL;
  	}
  
  	/*
*************** expand_inherited_rtentry(PlannerInfo *ro
*** 1606,1613 ****
  		root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
  	}
  
! 	/* Otherwise, OK to add to root->append_rel_list */
! 	root->append_rel_list = list_concat(root->append_rel_list, appinfos);
  }
  
  /*
--- 1641,1648 ----
  		root->pcinfo_list = lappend(root->pcinfo_list, pcinfo);
  	}
  
! 	/* The following will be concatenated to root->append_rel_list. */
! 	return appinfos;
  }
  
  /*
*************** translate_col_privs(const Bitmapset *par
*** 1767,1776 ****
  
  /*
   * adjust_appendrel_attrs
!  *	  Copy the specified query or expression and translate Vars referring
!  *	  to the parent rel of the specified AppendRelInfo to refer to the
!  *	  child rel instead.  We also update rtindexes appearing outside Vars,
!  *	  such as resultRelation and jointree relids.
   *
   * Note: this is only applied after conversion of sublinks to subplans,
   * so we don't need to cope with recursion into sub-queries.
--- 1802,1812 ----
  
  /*
   * adjust_appendrel_attrs
!  *	  Copy the specified query or expression and translate Vars referring to
!  *	  the parent rels of the child rels specified in the given list of
!  *	  AppendRelInfos to refer to the corresponding child rel instead.  We also
!  *	  update rtindexes appearing outside Vars, such as resultRelation and
!  *	  jointree relids.
   *
   * Note: this is only applied after conversion of sublinks to subplans,
   * so we don't need to cope with recursion into sub-queries.
*************** translate_col_privs(const Bitmapset *par
*** 1779,1791 ****
   * maybe we should try to fold the two routines together.
   */
  Node *
! adjust_appendrel_attrs(PlannerInfo *root, Node *node, AppendRelInfo *appinfo)
  {
  	Node	   *result;
  	adjust_appendrel_attrs_context context;
  
  	context.root = root;
! 	context.appinfo = appinfo;
  
  	/*
  	 * Must be prepared to start with a Query or a bare expression tree.
--- 1815,1835 ----
   * maybe we should try to fold the two routines together.
   */
  Node *
! adjust_appendrel_attrs(PlannerInfo *root, Node *node, int nappinfos,
! 					   AppendRelInfo **appinfos)
  {
  	Node	   *result;
  	adjust_appendrel_attrs_context context;
  
  	context.root = root;
! 	context.nappinfos = nappinfos;
! 	context.appinfos = appinfos;
! 
! 	/*
! 	 * Catch a caller who wants to adjust expressions, but doesn't pass any
! 	 * AppendRelInfo.
! 	 */
! 	Assert(appinfos && nappinfos >= 1);
  
  	/*
  	 * Must be prepared to start with a Query or a bare expression tree.
*************** adjust_appendrel_attrs(PlannerInfo *root
*** 1793,1812 ****
  	if (node && IsA(node, Query))
  	{
  		Query	   *newnode;
  
  		newnode = query_tree_mutator((Query *) node,
  									 adjust_appendrel_attrs_mutator,
  									 (void *) &context,
  									 QTW_IGNORE_RC_SUBQUERIES);
! 		if (newnode->resultRelation == appinfo->parent_relid)
  		{
! 			newnode->resultRelation = appinfo->child_relid;
! 			/* Fix tlist resnos too, if it's inherited UPDATE */
! 			if (newnode->commandType == CMD_UPDATE)
! 				newnode->targetList =
! 					adjust_inherited_tlist(newnode->targetList,
! 										   appinfo);
  		}
  		result = (Node *) newnode;
  	}
  	else
--- 1837,1864 ----
  	if (node && IsA(node, Query))
  	{
  		Query	   *newnode;
+ 		int		cnt;
  
  		newnode = query_tree_mutator((Query *) node,
  									 adjust_appendrel_attrs_mutator,
  									 (void *) &context,
  									 QTW_IGNORE_RC_SUBQUERIES);
! 		for (cnt = 0; cnt < nappinfos; cnt++)
  		{
! 			AppendRelInfo *appinfo = appinfos[cnt];
! 
! 			if (newnode->resultRelation == appinfo->parent_relid)
! 			{
! 				newnode->resultRelation = appinfo->child_relid;
! 				/* Fix tlist resnos too, if it's inherited UPDATE */
! 				if (newnode->commandType == CMD_UPDATE)
! 					newnode->targetList =
! 									adjust_inherited_tlist(newnode->targetList,
! 														   appinfo);
! 				break;
! 			}
  		}
+ 
  		result = (Node *) newnode;
  	}
  	else
*************** static Node *
*** 1819,1831 ****
  adjust_appendrel_attrs_mutator(Node *node,
  							   adjust_appendrel_attrs_context *context)
  {
! 	AppendRelInfo *appinfo = context->appinfo;
  
  	if (node == NULL)
  		return NULL;
  	if (IsA(node, Var))
  	{
  		Var		   *var = (Var *) copyObject(node);
  
  		if (var->varlevelsup == 0 &&
  			var->varno == appinfo->parent_relid)
--- 1871,1900 ----
  adjust_appendrel_attrs_mutator(Node *node,
  							   adjust_appendrel_attrs_context *context)
  {
! 	AppendRelInfo **appinfos = context->appinfos;
! 	int		nappinfos = context->nappinfos;
! 	int		cnt;
! 
! 	/*
! 	 * Catch a caller who wants to adjust expressions, but doesn't pass any
! 	 * AppendRelInfo.
! 	 */
! 	Assert(appinfos && nappinfos >= 1);
  
  	if (node == NULL)
  		return NULL;
  	if (IsA(node, Var))
  	{
  		Var		   *var = (Var *) copyObject(node);
+ 		AppendRelInfo *appinfo;
+ 
+ 		for (cnt = 0; cnt < nappinfos; cnt++)
+ 		{
+ 			appinfo = appinfos[cnt];
+ 
+ 			if (var->varno == appinfo->parent_relid)
+ 				break;
+ 		}
  
  		if (var->varlevelsup == 0 &&
  			var->varno == appinfo->parent_relid)
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 1908,1936 ****
  	{
  		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
  
! 		if (cexpr->cvarno == appinfo->parent_relid)
! 			cexpr->cvarno = appinfo->child_relid;
  		return (Node *) cexpr;
  	}
  	if (IsA(node, RangeTblRef))
  	{
  		RangeTblRef *rtr = (RangeTblRef *) copyObject(node);
  
! 		if (rtr->rtindex == appinfo->parent_relid)
! 			rtr->rtindex = appinfo->child_relid;
  		return (Node *) rtr;
  	}
  	if (IsA(node, JoinExpr))
  	{
  		/* Copy the JoinExpr node with correct mutation of subnodes */
  		JoinExpr   *j;
  
  		j = (JoinExpr *) expression_tree_mutator(node,
  											  adjust_appendrel_attrs_mutator,
  												 (void *) context);
  		/* now fix JoinExpr's rtindex (probably never happens) */
! 		if (j->rtindex == appinfo->parent_relid)
! 			j->rtindex = appinfo->child_relid;
  		return (Node *) j;
  	}
  	if (IsA(node, PlaceHolderVar))
--- 1977,2030 ----
  	{
  		CurrentOfExpr *cexpr = (CurrentOfExpr *) copyObject(node);
  
! 		for (cnt = 0; cnt < nappinfos; cnt++)
! 		{
! 			AppendRelInfo *appinfo = appinfos[cnt];
! 
! 			if (cexpr->cvarno == appinfo->parent_relid)
! 			{
! 				cexpr->cvarno = appinfo->child_relid;
! 				break;
! 			}
! 		}
  		return (Node *) cexpr;
  	}
  	if (IsA(node, RangeTblRef))
  	{
  		RangeTblRef *rtr = (RangeTblRef *) copyObject(node);
  
! 		for (cnt = 0; cnt < nappinfos; cnt++)
! 		{
! 			AppendRelInfo *appinfo = appinfos[cnt];
! 
! 			if (rtr->rtindex == appinfo->parent_relid)
! 			{
! 				rtr->rtindex = appinfo->child_relid;
! 				break;
! 			}
! 		}
  		return (Node *) rtr;
  	}
  	if (IsA(node, JoinExpr))
  	{
  		/* Copy the JoinExpr node with correct mutation of subnodes */
  		JoinExpr   *j;
+ 		AppendRelInfo *appinfo;
  
  		j = (JoinExpr *) expression_tree_mutator(node,
  											  adjust_appendrel_attrs_mutator,
  												 (void *) context);
  		/* now fix JoinExpr's rtindex (probably never happens) */
! 		for (cnt = 0; cnt < nappinfos; cnt++)
! 		{
! 			appinfo = appinfos[cnt];
! 
! 			if (j->rtindex == appinfo->parent_relid)
! 			{
! 				j->rtindex = appinfo->child_relid;
! 				break;
! 			}
! 		}
  		return (Node *) j;
  	}
  	if (IsA(node, PlaceHolderVar))
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 1943,1951 ****
  														 (void *) context);
  		/* now fix PlaceHolderVar's relid sets */
  		if (phv->phlevelsup == 0)
! 			phv->phrels = adjust_relid_set(phv->phrels,
! 										   appinfo->parent_relid,
! 										   appinfo->child_relid);
  		return (Node *) phv;
  	}
  	/* Shouldn't need to handle planner auxiliary nodes here */
--- 2037,2044 ----
  														 (void *) context);
  		/* now fix PlaceHolderVar's relid sets */
  		if (phv->phlevelsup == 0)
! 			phv->phrels = adjust_child_relids(phv->phrels, context->nappinfos,
! 											  context->appinfos);
  		return (Node *) phv;
  	}
  	/* Shouldn't need to handle planner auxiliary nodes here */
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 1976,1999 ****
  			adjust_appendrel_attrs_mutator((Node *) oldinfo->orclause, context);
  
  		/* adjust relid sets too */
! 		newinfo->clause_relids = adjust_relid_set(oldinfo->clause_relids,
! 												  appinfo->parent_relid,
! 												  appinfo->child_relid);
! 		newinfo->required_relids = adjust_relid_set(oldinfo->required_relids,
! 													appinfo->parent_relid,
! 													appinfo->child_relid);
! 		newinfo->outer_relids = adjust_relid_set(oldinfo->outer_relids,
! 												 appinfo->parent_relid,
! 												 appinfo->child_relid);
! 		newinfo->nullable_relids = adjust_relid_set(oldinfo->nullable_relids,
! 													appinfo->parent_relid,
! 													appinfo->child_relid);
! 		newinfo->left_relids = adjust_relid_set(oldinfo->left_relids,
! 												appinfo->parent_relid,
! 												appinfo->child_relid);
! 		newinfo->right_relids = adjust_relid_set(oldinfo->right_relids,
! 												 appinfo->parent_relid,
! 												 appinfo->child_relid);
  
  		/*
  		 * Reset cached derivative fields, since these might need to have
--- 2069,2092 ----
  			adjust_appendrel_attrs_mutator((Node *) oldinfo->orclause, context);
  
  		/* adjust relid sets too */
! 		newinfo->clause_relids = adjust_child_relids(oldinfo->clause_relids,
! 													 context->nappinfos,
! 													 context->appinfos);
! 		newinfo->required_relids = adjust_child_relids(oldinfo->required_relids,
! 													   context->nappinfos,
! 													   context->appinfos);
! 		newinfo->outer_relids = adjust_child_relids(oldinfo->outer_relids,
! 													context->nappinfos,
! 													context->appinfos);
! 		newinfo->nullable_relids = adjust_child_relids(oldinfo->nullable_relids,
! 													   context->nappinfos,
! 													   context->appinfos);
! 		newinfo->left_relids = adjust_child_relids(oldinfo->left_relids,
! 												   context->nappinfos,
! 												   context->appinfos);
! 		newinfo->right_relids = adjust_child_relids(oldinfo->right_relids,
! 													context->nappinfos,
! 													context->appinfos);
  
  		/*
  		 * Reset cached derivative fields, since these might need to have
*************** adjust_appendrel_attrs_mutator(Node *nod
*** 2025,2047 ****
  }
  
  /*
!  * Substitute newrelid for oldrelid in a Relid set
   */
! static Relids
! adjust_relid_set(Relids relids, Index oldrelid, Index newrelid)
  {
! 	if (bms_is_member(oldrelid, relids))
  	{
! 		/* Ensure we have a modifiable copy */
! 		relids = bms_copy(relids);
! 		/* Remove old, add new */
! 		relids = bms_del_member(relids, oldrelid);
! 		relids = bms_add_member(relids, newrelid);
  	}
  	return relids;
  }
  
  /*
   * Adjust the targetlist entries of an inherited UPDATE operation
   *
   * The expressions have already been fixed, but we have to make sure that
--- 2118,2212 ----
  }
  
  /*
!  * Replace parent relids by child relids in the copy of given relid set
!  * according to the given list of AppendRelInfos. The given relid set is
!  * returned as is if it contains no parent in the given list, otherwise, the
!  * given relid set is not changed.
   */
! Relids
! adjust_child_relids(Relids relids, int nappinfos, AppendRelInfo **appinfos)
  {
! 	Bitmapset  *result = NULL;
! 	int		cnt;
! 
! 	for (cnt = 0; cnt < nappinfos; cnt++)
  	{
! 		AppendRelInfo	*appinfo = appinfos[cnt];
! 
! 		/* Remove parent, add child */
! 		if (bms_is_member(appinfo->parent_relid, relids))
! 		{
! 			/* Make a copy if we are changing the set. */
! 			if (!result)
! 				result = bms_copy(relids);
! 
! 			result = bms_del_member(result, appinfo->parent_relid);
! 			result = bms_add_member(result, appinfo->child_relid);
! 		}
  	}
+ 
+ 	/* Return new set if we modified the given set. */
+ 	if (result)
+ 		return result;
+ 
+ 	/* Else return the given relids set as is. */
  	return relids;
  }
  
  /*
+  * Replace any relid present in top_parent_relids with its child in
+  * child_relids. Members of child_relids can be multiple levels below top
+  * parent in the partition hierarchy.
+  */
+ Relids
+ adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
+ 							   Relids child_relids, Relids top_parent_relids)
+ {
+ 	AppendRelInfo **appinfos;
+ 	int		nappinfos;
+ 	Relids		parent_relids = NULL;
+ 	Relids		result;
+ 	Relids		tmp_result = NULL;
+ 	int		cnt;
+ 
+ 	/*
+ 	 * If the given relids set doesn't contain any of the top parent relids,
+ 	 * it will remain unchanged.
+ 	 */
+ 	if (!bms_overlap(relids, top_parent_relids))
+ 		return relids;
+ 
+ 	appinfos = find_appinfos_by_relids(root, child_relids, &nappinfos);
+ 
+ 	/* Construct relids set for the immediate parent of the given child. */
+ 	for (cnt = 0; cnt < nappinfos; cnt++)
+ 	{
+ 		AppendRelInfo   *appinfo = appinfos[cnt];
+ 
+ 		parent_relids = bms_add_member(parent_relids, appinfo->parent_relid);
+ 	}
+ 
+ 	/* Recurse if immediate parent is not the top parent. */
+ 	if (!bms_equal(parent_relids, top_parent_relids))
+ 	{
+ 		tmp_result = adjust_child_relids_multilevel(root, relids,
+ 													parent_relids,
+ 													top_parent_relids);
+ 		relids = tmp_result;
+ 	}
+ 
+ 	result = adjust_child_relids(relids, nappinfos, appinfos);
+ 
+ 	/* Free memory consumed by any intermediate result. */
+ 	if (tmp_result)
+ 		bms_free(tmp_result);
+ 	bms_free(parent_relids);
+ 	pfree(appinfos);
+ 
+ 	return result;
+ }
+ 
+ /*
   * Adjust the targetlist entries of an inherited UPDATE operation
   *
   * The expressions have already been fixed, but we have to make sure that
*************** adjust_inherited_tlist(List *tlist, Appe
*** 2142,2162 ****
   * adjust_appendrel_attrs_multilevel
   *	  Apply Var translations from a toplevel appendrel parent down to a child.
   *
!  * In some cases we need to translate expressions referencing a baserel
   * to reference an appendrel child that's multiple levels removed from it.
   */
  Node *
  adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  RelOptInfo *child_rel)
  {
! 	AppendRelInfo *appinfo = find_childrel_appendrelinfo(root, child_rel);
! 	RelOptInfo *parent_rel = find_base_rel(root, appinfo->parent_relid);
  
- 	/* If parent is also a child, first recurse to apply its translations */
- 	if (IS_OTHER_REL(parent_rel))
- 		node = adjust_appendrel_attrs_multilevel(root, node, parent_rel);
- 	else
- 		Assert(parent_rel->reloptkind == RELOPT_BASEREL);
  	/* Now translate for this child */
! 	return adjust_appendrel_attrs(root, node, appinfo);
  }
--- 2307,2432 ----
   * adjust_appendrel_attrs_multilevel
   *	  Apply Var translations from a toplevel appendrel parent down to a child.
   *
!  * In some cases we need to translate expressions referencing a parent relation
   * to reference an appendrel child that's multiple levels removed from it.
   */
  Node *
  adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  Relids child_relids,
! 								  Relids top_parent_relids)
  {
! 	AppendRelInfo **appinfos;
! 	Bitmapset  *parent_relids = NULL;
! 	int		nappinfos;
! 	int		cnt;
! 
! 	Assert(bms_num_members(child_relids) == bms_num_members(top_parent_relids));
! 
! 	appinfos = find_appinfos_by_relids(root, child_relids, &nappinfos);
! 
! 	/* Construct relids set for the immediate parent of given child. */
! 	for (cnt = 0; cnt < nappinfos; cnt++)
! 	{
! 		AppendRelInfo  *appinfo = appinfos[cnt];
! 
! 		parent_relids = bms_add_member(parent_relids, appinfo->parent_relid);
! 	}
! 
! 	/* Recurse if immediate parent is not the top parent. */
! 	if (!bms_equal(parent_relids, top_parent_relids))
! 		node = adjust_appendrel_attrs_multilevel(root, node, parent_relids,
! 												 top_parent_relids);
  
  	/* Now translate for this child */
! 	node = adjust_appendrel_attrs(root, node, nappinfos, appinfos);
! 
! 	pfree(appinfos);
! 
! 	return node;
! }
! 
! /*
!  * Construct the SpecialJoinInfo for a child-join by translating
!  * SpecialJoinInfo for the join between parents. left_relids and right_relids
!  * are the relids of left and right side of the join respectively.
!  */
! SpecialJoinInfo *
! build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
! 						Relids left_relids, Relids right_relids)
! {
! 	SpecialJoinInfo *sjinfo = makeNode(SpecialJoinInfo);
! 	AppendRelInfo **left_appinfos;
! 	int		left_nappinfos;
! 	AppendRelInfo **right_appinfos;
! 	int		right_nappinfos;
! 
! 	memcpy(sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo));
! 	left_appinfos = find_appinfos_by_relids(root, left_relids,
! 											&left_nappinfos);
! 	right_appinfos = find_appinfos_by_relids(root, right_relids,
! 											 &right_nappinfos);
! 
! 	sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand,
! 											   left_nappinfos, left_appinfos);
! 	sjinfo->min_righthand = adjust_child_relids(sjinfo->min_righthand,
! 												right_nappinfos,
! 												right_appinfos);
! 	sjinfo->syn_lefthand = adjust_child_relids(sjinfo->syn_lefthand,
! 											   left_nappinfos, left_appinfos);
! 	sjinfo->syn_righthand = adjust_child_relids(sjinfo->syn_righthand,
! 												right_nappinfos,
! 												right_appinfos);
! 
! 	/*
! 	 * Replace the Var nodes of parent with those of children in expressions.
! 	 * This function may be called within a temporary context, but the
! 	 * expressions will be shallow-copied into the plan. Hence copy those in
! 	 * the planner's context.
! 	 */
! 	sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root,
! 											   (Node *) sjinfo->semi_rhs_exprs,
! 															   right_nappinfos,
! 															   right_appinfos);
! 
! 	pfree(left_appinfos);
! 	pfree(right_appinfos);
! 
! 	return sjinfo;
! }
! 
! /*
!  * find_appinfos_by_relids
!  * 		Find AppendRelInfo structures for all relations specified by relids.
!  *
!  * The AppendRelInfos are returned in an array, which can be pfree'd by the
!  * caller.
!  */
! AppendRelInfo **
! find_appinfos_by_relids(PlannerInfo *root, Relids relids, int *nappinfos)
! {
! 	ListCell   *lc;
! 	AppendRelInfo **appinfos;
! 	int		cnt = 0;
! 
! 	*nappinfos = bms_num_members(relids);
! 	appinfos = (AppendRelInfo **) palloc(sizeof(AppendRelInfo *) * *nappinfos);
! 
! 	foreach (lc, root->append_rel_list)
! 	{
! 		AppendRelInfo *appinfo = lfirst(lc);
! 
! 		if (bms_is_member(appinfo->child_relid, relids))
! 		{
! 			appinfos[cnt] = appinfo;
! 			cnt++;
! 
! 			/* Stop when we have gathered all the AppendRelInfos. */
! 			if (cnt == *nappinfos)
! 				return appinfos;
! 		}
! 	}
! 
! 	/* Should have found the entries ... */
! 	elog(ERROR, "Did not find one or more of requested child rels in append_rel_list");
! 	return NULL;	/* not reached */
  }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
new file mode 100644
index 2d5caae..79a000b
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
***************
*** 18,32 ****
--- 18,39 ----
  
  #include "miscadmin.h"
  #include "nodes/nodeFuncs.h"
+ #include "nodes/extensible.h"
  #include "optimizer/clauses.h"
  #include "optimizer/cost.h"
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
+ #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
+ #include "optimizer/tlist.h"
+ /* TODO Remove this if get_grouping_expressions ends up in another module. */
+ #include "optimizer/tlist.h"
  #include "optimizer/var.h"
  #include "parser/parsetree.h"
+ #include "foreign/fdwapi.h"
  #include "utils/lsyscache.h"
+ #include "utils/memutils.h"
  #include "utils/selfuncs.h"
  
  
*************** set_cheapest(RelOptInfo *parent_rel)
*** 409,416 ****
   * Returns nothing, but modifies parent_rel->pathlist.
   */
  void
! add_path(RelOptInfo *parent_rel, Path *new_path)
  {
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	List	   *new_path_pathkeys;
--- 416,424 ----
   * Returns nothing, but modifies parent_rel->pathlist.
   */
  void
! add_path(RelOptInfo *parent_rel, Path *new_path, bool grouped)
  {
+ 	List	   *pathlist;
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	List	   *new_path_pathkeys;
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 427,432 ****
--- 435,448 ----
  	/* Pretend parameterized paths have no pathkeys, per comment above */
  	new_path_pathkeys = new_path->param_info ? NIL : new_path->pathkeys;
  
+ 	if (!grouped)
+ 		pathlist = parent_rel->pathlist;
+ 	else
+ 	{
+ 		Assert(parent_rel->gpi != NULL);
+ 		pathlist = parent_rel->gpi->pathlist;
+ 	}
+ 
  	/*
  	 * Loop to check proposed new path against old paths.  Note it is possible
  	 * for more than one old path to be tossed out because new_path dominates
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 436,442 ****
  	 * list cell.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(parent_rel->pathlist); p1 != NULL; p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		bool		remove_old = false; /* unless new proves superior */
--- 452,458 ----
  	 * list cell.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(pathlist); p1 != NULL; p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		bool		remove_old = false; /* unless new proves superior */
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 582,589 ****
  		 */
  		if (remove_old)
  		{
! 			parent_rel->pathlist = list_delete_cell(parent_rel->pathlist,
! 													p1, p1_prev);
  
  			/*
  			 * Delete the data pointed-to by the deleted cell, if possible
--- 598,604 ----
  		 */
  		if (remove_old)
  		{
! 			pathlist = list_delete_cell(pathlist, p1, p1_prev);
  
  			/*
  			 * Delete the data pointed-to by the deleted cell, if possible
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 614,622 ****
  	{
  		/* Accept the new path: insert it at proper place in pathlist */
  		if (insert_after)
! 			lappend_cell(parent_rel->pathlist, insert_after, new_path);
  		else
! 			parent_rel->pathlist = lcons(new_path, parent_rel->pathlist);
  	}
  	else
  	{
--- 629,642 ----
  	{
  		/* Accept the new path: insert it at proper place in pathlist */
  		if (insert_after)
! 			lappend_cell(pathlist, insert_after, new_path);
  		else
! 			pathlist = lcons(new_path, pathlist);
! 
! 		if (!grouped)
! 			parent_rel->pathlist = pathlist;
! 		else
! 			parent_rel->gpi->pathlist = pathlist;
  	}
  	else
  	{
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 646,653 ****
  bool
  add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 				  List *pathkeys, Relids required_outer)
  {
  	List	   *new_path_pathkeys;
  	bool		consider_startup;
  	ListCell   *p1;
--- 666,674 ----
  bool
  add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 				  List *pathkeys, Relids required_outer, bool grouped)
  {
+ 	List	   *pathlist;
  	List	   *new_path_pathkeys;
  	bool		consider_startup;
  	ListCell   *p1;
*************** add_path_precheck(RelOptInfo *parent_rel
*** 656,664 ****
  	new_path_pathkeys = required_outer ? NIL : pathkeys;
  
  	/* Decide whether new path's startup cost is interesting */
! 	consider_startup = required_outer ? parent_rel->consider_param_startup : parent_rel->consider_startup;
  
! 	foreach(p1, parent_rel->pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
--- 677,694 ----
  	new_path_pathkeys = required_outer ? NIL : pathkeys;
  
  	/* Decide whether new path's startup cost is interesting */
! 	consider_startup = required_outer ? parent_rel->consider_param_startup :
! 		parent_rel->consider_startup;
  
! 	if (!grouped)
! 		pathlist = parent_rel->pathlist;
! 	else
! 	{
! 		Assert(parent_rel->gpi != NULL);
! 		pathlist = parent_rel->gpi->pathlist;
! 	}
! 
! 	foreach(p1, pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
*************** add_path_precheck(RelOptInfo *parent_rel
*** 749,771 ****
   *	  referenced by partial BitmapHeapPaths.
   */
  void
! add_partial_path(RelOptInfo *parent_rel, Path *new_path)
  {
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	ListCell   *p1;
  	ListCell   *p1_prev;
  	ListCell   *p1_next;
  
  	/* Check for query cancel. */
  	CHECK_FOR_INTERRUPTS();
  
  	/*
  	 * As in add_path, throw out any paths which are dominated by the new
  	 * path, but throw out the new path if some existing path dominates it.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(parent_rel->partial_pathlist); p1 != NULL;
  		 p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
--- 779,810 ----
   *	  referenced by partial BitmapHeapPaths.
   */
  void
! add_partial_path(RelOptInfo *parent_rel, Path *new_path, bool grouped)
  {
  	bool		accept_new = true;		/* unless we find a superior old path */
  	ListCell   *insert_after = NULL;	/* where to insert new item */
  	ListCell   *p1;
  	ListCell   *p1_prev;
  	ListCell   *p1_next;
+ 	List	   *pathlist;
  
  	/* Check for query cancel. */
  	CHECK_FOR_INTERRUPTS();
  
+ 	if (!grouped)
+ 		pathlist = parent_rel->partial_pathlist;
+ 	else
+ 	{
+ 		Assert(parent_rel->gpi != NULL);
+ 		pathlist = parent_rel->gpi->partial_pathlist;
+ 	}
+ 
  	/*
  	 * As in add_path, throw out any paths which are dominated by the new
  	 * path, but throw out the new path if some existing path dominates it.
  	 */
  	p1_prev = NULL;
! 	for (p1 = list_head(pathlist); p1 != NULL;
  		 p1 = p1_next)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
*************** add_partial_path(RelOptInfo *parent_rel,
*** 819,830 ****
  		}
  
  		/*
! 		 * Remove current element from partial_pathlist if dominated by new.
  		 */
  		if (remove_old)
  		{
! 			parent_rel->partial_pathlist =
! 				list_delete_cell(parent_rel->partial_pathlist, p1, p1_prev);
  			pfree(old_path);
  			/* p1_prev does not advance */
  		}
--- 858,868 ----
  		}
  
  		/*
! 		 * Remove current element from pathlist if dominated by new.
  		 */
  		if (remove_old)
  		{
! 			pathlist = list_delete_cell(pathlist, p1, p1_prev);
  			pfree(old_path);
  			/* p1_prev does not advance */
  		}
*************** add_partial_path(RelOptInfo *parent_rel,
*** 839,845 ****
  
  		/*
  		 * If we found an old path that dominates new_path, we can quit
! 		 * scanning the partial_pathlist; we will not add new_path, and we
  		 * assume new_path cannot dominate any later path.
  		 */
  		if (!accept_new)
--- 877,883 ----
  
  		/*
  		 * If we found an old path that dominates new_path, we can quit
! 		 * scanning the pathlist; we will not add new_path, and we
  		 * assume new_path cannot dominate any later path.
  		 */
  		if (!accept_new)
*************** add_partial_path(RelOptInfo *parent_rel,
*** 850,859 ****
  	{
  		/* Accept the new path: insert it at proper place */
  		if (insert_after)
! 			lappend_cell(parent_rel->partial_pathlist, insert_after, new_path);
  		else
! 			parent_rel->partial_pathlist =
! 				lcons(new_path, parent_rel->partial_pathlist);
  	}
  	else
  	{
--- 888,901 ----
  	{
  		/* Accept the new path: insert it at proper place */
  		if (insert_after)
! 			lappend_cell(pathlist, insert_after, new_path);
  		else
! 			pathlist = lcons(new_path, pathlist);
! 
! 		if (!grouped)
! 			parent_rel->partial_pathlist = pathlist;
! 		else
! 			parent_rel->gpi->partial_pathlist = pathlist;
  	}
  	else
  	{
*************** add_partial_path(RelOptInfo *parent_rel,
*** 874,882 ****
   */
  bool
  add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
! 						  List *pathkeys)
  {
  	ListCell   *p1;
  
  	/*
  	 * Our goal here is twofold.  First, we want to find out whether this path
--- 916,933 ----
   */
  bool
  add_partial_path_precheck(RelOptInfo *parent_rel, Cost total_cost,
! 						  List *pathkeys, bool grouped)
  {
  	ListCell   *p1;
+ 	List	   *pathlist;
+ 
+ 	if (!grouped)
+ 		pathlist = parent_rel->partial_pathlist;
+ 	else
+ 	{
+ 		Assert(parent_rel->gpi != NULL);
+ 		pathlist = parent_rel->gpi->partial_pathlist;
+ 	}
  
  	/*
  	 * Our goal here is twofold.  First, we want to find out whether this path
*************** add_partial_path_precheck(RelOptInfo *pa
*** 886,895 ****
  	 * final cost computations.  If so, we definitely want to consider it.
  	 *
  	 * Unlike add_path(), we always compare pathkeys here.  This is because we
! 	 * expect partial_pathlist to be very short, and getting a definitive
! 	 * answer at this stage avoids the need to call add_path_precheck.
  	 */
! 	foreach(p1, parent_rel->partial_pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
--- 937,947 ----
  	 * final cost computations.  If so, we definitely want to consider it.
  	 *
  	 * Unlike add_path(), we always compare pathkeys here.  This is because we
! 	 * expect partial_pathlist / grouped_pathlist to be very short, and
! 	 * getting a definitive answer at this stage avoids the need to call
! 	 * add_path_precheck.
  	 */
! 	foreach(p1, pathlist)
  	{
  		Path	   *old_path = (Path *) lfirst(p1);
  		PathKeysComparison keyscmp;
*************** add_partial_path_precheck(RelOptInfo *pa
*** 918,924 ****
  	 * completion.
  	 */
  	if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
! 						   NULL))
  		return false;
  
  	return true;
--- 970,976 ----
  	 * completion.
  	 */
  	if (!add_path_precheck(parent_rel, total_cost, total_cost, pathkeys,
! 						   NULL, grouped))
  		return false;
  
  	return true;
*************** create_foreignscan_path(PlannerInfo *roo
*** 1994,2007 ****
   * Note: result must not share storage with either input
   */
  Relids
! calc_nestloop_required_outer(Path *outer_path, Path *inner_path)
  {
- 	Relids		outer_paramrels = PATH_REQ_OUTER(outer_path);
- 	Relids		inner_paramrels = PATH_REQ_OUTER(inner_path);
  	Relids		required_outer;
  
  	/* inner_path can require rels from outer path, but not vice versa */
! 	Assert(!bms_overlap(outer_paramrels, inner_path->parent->relids));
  	/* easy case if inner path is not parameterized */
  	if (!inner_paramrels)
  		return bms_copy(outer_paramrels);
--- 2046,2060 ----
   * Note: result must not share storage with either input
   */
  Relids
! calc_nestloop_required_outer(Relids outerrelids,
! 							 Relids outer_paramrels,
! 							 Relids innerrelids,
! 							 Relids inner_paramrels)
  {
  	Relids		required_outer;
  
  	/* inner_path can require rels from outer path, but not vice versa */
! 	Assert(!bms_overlap(outer_paramrels, innerrelids));
  	/* easy case if inner path is not parameterized */
  	if (!inner_paramrels)
  		return bms_copy(outer_paramrels);
*************** calc_nestloop_required_outer(Path *outer
*** 2009,2015 ****
  	required_outer = bms_union(outer_paramrels, inner_paramrels);
  	/* ... and remove any mention of now-satisfied outer rels */
  	required_outer = bms_del_members(required_outer,
! 									 outer_path->parent->relids);
  	/* maintain invariant that required_outer is exactly NULL if empty */
  	if (bms_is_empty(required_outer))
  	{
--- 2062,2068 ----
  	required_outer = bms_union(outer_paramrels, inner_paramrels);
  	/* ... and remove any mention of now-satisfied outer rels */
  	required_outer = bms_del_members(required_outer,
! 									 outerrelids);
  	/* maintain invariant that required_outer is exactly NULL if empty */
  	if (bms_is_empty(required_outer))
  	{
*************** calc_non_nestloop_required_outer(Path *o
*** 2055,2060 ****
--- 2108,2114 ----
   * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
   * 'pathkeys' are the path keys of the new join path
   * 'required_outer' is the set of required outer rels
+  * 'target' can be passed to override that of joinrel.
   *
   * Returns the resulting path node.
   */
*************** create_nestloop_path(PlannerInfo *root,
*** 2068,2074 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer)
  {
  	NestPath   *pathnode = makeNode(NestPath);
  	Relids		inner_req_outer = PATH_REQ_OUTER(inner_path);
--- 2122,2129 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer,
! 					 PathTarget *target)
  {
  	NestPath   *pathnode = makeNode(NestPath);
  	Relids		inner_req_outer = PATH_REQ_OUTER(inner_path);
*************** create_nestloop_path(PlannerInfo *root,
*** 2101,2107 ****
  
  	pathnode->path.pathtype = T_NestLoop;
  	pathnode->path.parent = joinrel;
! 	pathnode->path.pathtarget = joinrel->reltarget;
  	pathnode->path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
--- 2156,2162 ----
  
  	pathnode->path.pathtype = T_NestLoop;
  	pathnode->path.parent = joinrel;
! 	pathnode->path.pathtarget = target == NULL ? joinrel->reltarget : target;
  	pathnode->path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
*************** create_mergejoin_path(PlannerInfo *root,
*** 2159,2171 ****
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys)
  {
  	MergePath  *pathnode = makeNode(MergePath);
  
  	pathnode->jpath.path.pathtype = T_MergeJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = joinrel->reltarget;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
--- 2214,2228 ----
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys,
! 					  PathTarget *target)
  {
  	MergePath  *pathnode = makeNode(MergePath);
  
  	pathnode->jpath.path.pathtype = T_MergeJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = target == NULL ? joinrel->reltarget :
! 		target;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
*************** create_mergejoin_path(PlannerInfo *root,
*** 2210,2215 ****
--- 2267,2273 ----
   * 'required_outer' is the set of required outer rels
   * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
   *		(this should be a subset of the restrict_clauses list)
+  * 'target' can be passed to override that of joinrel.
   */
  HashPath *
  create_hashjoin_path(PlannerInfo *root,
*************** create_hashjoin_path(PlannerInfo *root,
*** 2221,2233 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses)
  {
  	HashPath   *pathnode = makeNode(HashPath);
  
  	pathnode->jpath.path.pathtype = T_HashJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = joinrel->reltarget;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
--- 2279,2293 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses,
! 					 PathTarget *target)
  {
  	HashPath   *pathnode = makeNode(HashPath);
  
  	pathnode->jpath.path.pathtype = T_HashJoin;
  	pathnode->jpath.path.parent = joinrel;
! 	pathnode->jpath.path.pathtarget = target == NULL ? joinrel->reltarget :
! 		target;
  	pathnode->jpath.path.param_info =
  		get_joinrel_parampathinfo(root,
  								  joinrel,
*************** create_agg_path(PlannerInfo *root,
*** 2713,2718 ****
--- 2773,2948 ----
  }
  
  /*
+  * Apply partial AGG_SORTED aggregation path to subpath if it's suitably
+  * sorted.
+  *
+  * first_call indicates whether the function is being called first time for
+  * given index --- since the target should not change, we can skip the check
+  * of sorting during subsequent calls.
+  *
+  * group_clauses, group_exprs and agg_exprs are pointers to lists we populate
+  * when called first time for particular index, and that user passes for
+  * subsequent calls.
+  *
+  * NULL is returned if sorting of subpath output is not suitable.
+  */
+ AggPath *
+ create_partial_agg_sorted_path(PlannerInfo *root, Path *subpath,
+ 							   bool first_call,
+ 							   List **group_clauses, List **group_exprs,
+ 							   List **agg_exprs, double input_rows)
+ {
+ 	RelOptInfo	*rel;
+ 	AggClauseCosts  agg_costs;
+ 	double	dNumGroups;
+ 	AggPath	*result = NULL;
+ 
+ 	rel = subpath->parent;
+ 	Assert(rel->gpi != NULL);
+ 
+ 	if (subpath->pathkeys == NIL)
+ 		return NULL;
+ 
+ 	if (!grouping_is_sortable(root->parse->groupClause))
+ 		return NULL;
+ 
+ 	if (first_call)
+ 	{
+ 		ListCell	*lc1;
+ 		List	*key_subset = NIL;
+ 
+ 		/*
+ 		 * Find all query pathkeys that our relation does affect.
+ 		 */
+ 		foreach(lc1, root->group_pathkeys)
+ 		{
+ 			PathKey	*gkey = castNode(PathKey, lfirst(lc1));
+ 			ListCell	*lc2;
+ 
+ 			foreach(lc2, subpath->pathkeys)
+ 			{
+ 				PathKey	*skey = castNode(PathKey, lfirst(lc2));
+ 
+ 				if (skey == gkey)
+ 				{
+ 					key_subset = lappend(key_subset, gkey);
+ 					break;
+ 				}
+ 			}
+ 		}
+ 
+ 		if (key_subset == NIL)
+ 			return NULL;
+ 
+ 		/* Check if AGG_SORTED is useful for the whole query.  */
+ 		if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
+ 			return NULL;
+ 	}
+ 
+ 	if (first_call)
+ 		get_grouping_expressions(root, rel->gpi->target, group_clauses,
+ 								 group_exprs, agg_exprs);
+ 
+ 	MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ 	Assert(*agg_exprs != NIL);
+ 	get_agg_clause_costs(root, (Node *) *agg_exprs, AGGSPLIT_INITIAL_SERIAL,
+ 						 &agg_costs);
+ 
+ 	Assert(*group_exprs != NIL);
+ 	dNumGroups = estimate_num_groups(root, *group_exprs, input_rows, NULL);
+ 
+ 	/* TODO HAVING qual. */
+ 	Assert(*group_clauses != NIL);
+ 	result = create_agg_path(root, rel, subpath, rel->gpi->target, AGG_SORTED,
+ 							 AGGSPLIT_INITIAL_SERIAL, *group_clauses, NIL,
+ 							 &agg_costs, dNumGroups);
+ 
+ 	return result;
+ }
+ 
+ /*
+  * Appy partial AGG_HASHED aggregation to subpath.
+  *
+  * Arguments have the same meaning as those of create_agg_sorted_path.
+  *
+  */
+ AggPath *
+ create_partial_agg_hashed_path(PlannerInfo *root, Path *subpath,
+ 							   bool first_call,
+ 							   List **group_clauses, List **group_exprs,
+ 							   List **agg_exprs, double input_rows)
+ {
+ 	RelOptInfo	*rel;
+ 	bool	can_hash;
+ 	AggClauseCosts  agg_costs;
+ 	double	dNumGroups;
+ 	Size	hashaggtablesize;
+ 	Query	   *parse = root->parse;
+ 	AggPath	*result = NULL;
+ 
+ 	rel = subpath->parent;
+ 	Assert(rel->gpi != NULL);
+ 
+ 	if (first_call)
+ 	{
+ 		/*
+ 		 * Find one grouping clause per grouping column.
+ 		 *
+ 		 * All that create_agg_plan eventually needs of the clause is
+ 		 * tleSortGroupRef, so we don't have to care that the clause
+ 		 * expression might differ from texpr, in case texpr was derived from
+ 		 * EC.
+ 		 */
+ 		get_grouping_expressions(root, rel->gpi->target, group_clauses,
+ 								 group_exprs, agg_exprs);
+ 	}
+ 
+ 	MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ 	Assert(*agg_exprs != NIL);
+ 	get_agg_clause_costs(root, (Node *) *agg_exprs, AGGSPLIT_INITIAL_SERIAL,
+ 						 &agg_costs);
+ 
+ 	can_hash = (parse->groupClause != NIL &&
+ 				parse->groupingSets == NIL &&
+ 				agg_costs.numOrderedAggs == 0 &&
+ 				grouping_is_hashable(parse->groupClause));
+ 
+ 	if (can_hash)
+ 	{
+ 		Assert(*group_exprs != NIL);
+ 		dNumGroups = estimate_num_groups(root, *group_exprs, input_rows,
+ 										 NULL);
+ 
+ 		hashaggtablesize = estimate_hashagg_tablesize(subpath, &agg_costs,
+ 													  dNumGroups);
+ 
+ 		if (hashaggtablesize < work_mem * 1024L)
+ 		{
+ 			/*
+ 			 * Create the partial aggregation path.
+ 			 */
+ 			Assert(*group_clauses != NIL);
+ 
+ 			result = create_agg_path(root, rel, subpath,
+ 									 rel->gpi->target,
+ 									 AGG_HASHED,
+ 									 AGGSPLIT_INITIAL_SERIAL,
+ 									 *group_clauses, NIL,
+ 									 &agg_costs,
+ 									 dNumGroups);
+ 
+ 			/*
+ 			 * The agg path should require no fewer parameters than the plain
+ 			 * one.
+ 			 */
+ 			result->path.param_info = subpath->param_info;
+ 		}
+ 	}
+ 
+ 	return result;
+ }
+ 
+ /*
   * create_groupingsets_path
   *	  Creates a pathnode that represents performing GROUPING SETS aggregation
   *
*************** reparameterize_path(PlannerInfo *root, P
*** 3426,3428 ****
--- 3656,4081 ----
  	}
  	return NULL;
  }
+ 
+ /*
+  * reparameterize_path_by_child
+  * 		Given a path parameterized by the parent of the given relation,
+  * 		translate the path to be parameterized by the given child relation.
+  *
+  * The function creates a new path of the same type as the given path, but
+  * parameterized by the given child relation. If it can not reparameterize the
+  * path as required, it returns NULL.
+  *
+  * The cost, number of rows, width and parallel path properties depend upon
+  * path->parent, which does not change during the translation. Hence those
+  * members are copied as they are.
+  */
+ 
+ Path *
+ reparameterize_path_by_child(PlannerInfo *root, Path *path,
+ 							  RelOptInfo *child_rel)
+ {
+ 
+ #define FLAT_COPY_PATH(newnode, node, nodetype)  \
+ 	( (newnode) = makeNode(nodetype), \
+ 	  memcpy((newnode), (node), sizeof(nodetype)) )
+ 
+ 	Path	   *new_path;
+ 	ParamPathInfo   *new_ppi;
+ 	ParamPathInfo   *old_ppi;
+ 	Relids		required_outer;
+ 
+ 	/*
+ 	 * If the path is not parameterized by parent of the given relation or it it
+ 	 * doesn't need reparameterization.
+ 	 */
+ 	if (!path->param_info ||
+ 		!bms_overlap(PATH_REQ_OUTER(path), child_rel->top_parent_relids))
+ 	return path;
+ 
+ 	/*
+ 	 * Make a copy of the given path and reparameterize or translate the
+ 	 * path specific members.
+ 	 */
+ 	switch (nodeTag(path))
+ 	{
+ 		case T_Path:
+ 			FLAT_COPY_PATH(new_path, path, Path);
+ 			break;
+ 
+ 		case T_IndexPath:
+ 			{
+ 				IndexPath *ipath;
+ 
+ 				FLAT_COPY_PATH(ipath, path, IndexPath);
+ 				ipath->indexclauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 												  (Node *) ipath->indexclauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				ipath->indexquals = (List *) adjust_appendrel_attrs_multilevel(root,
+ 													(Node *) ipath->indexquals,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) ipath;
+ 			}
+ 			break;
+ 
+ 		case T_BitmapHeapPath:
+ 			{
+ 				BitmapHeapPath *bhpath;
+ 
+ 				FLAT_COPY_PATH(bhpath, path, BitmapHeapPath);
+ 				bhpath->bitmapqual = reparameterize_path_by_child(root,
+ 															bhpath->bitmapqual,
+ 																	child_rel);
+ 				new_path = (Path *) bhpath;
+ 			}
+ 			break;
+ 
+ 		case T_BitmapAndPath:
+ 			{
+ 				BitmapAndPath *bapath;
+ 				ListCell   *lc;
+ 				List	   *bitmapquals = NIL;
+ 
+ 				FLAT_COPY_PATH(bapath, path, BitmapAndPath);
+ 				foreach (lc, bapath->bitmapquals)
+ 				{
+ 					Path   *bmqpath = lfirst(lc);
+ 
+ 					bitmapquals = lappend(bitmapquals,
+ 										  reparameterize_path_by_child(root,
+ 																	   bmqpath,
+ 																   child_rel));
+ 				}
+ 				bapath->bitmapquals = bitmapquals;
+ 				new_path = (Path *) bapath;
+ 			}
+ 			break;
+ 
+ 		case T_BitmapOrPath:
+ 			{
+ 				BitmapOrPath *bopath;
+ 				ListCell   *lc;
+ 				List	   *bitmapquals = NIL;
+ 
+ 				FLAT_COPY_PATH(bopath, path, BitmapOrPath);
+ 				foreach (lc, bopath->bitmapquals)
+ 				{
+ 					Path   *bmqpath = lfirst(lc);
+ 
+ 					bitmapquals = lappend(bitmapquals,
+ 										  reparameterize_path_by_child(root,
+ 																	   bmqpath,
+ 																   child_rel));
+ 				}
+ 				bopath->bitmapquals = bitmapquals;
+ 				new_path = (Path *) bopath;
+ 			}
+ 			break;
+ 
+ 		case T_TidPath:
+ 			{
+ 				TidPath *tpath;
+ 
+ 				/*
+ 				 * TidPath contains tidquals, which do not contain any external
+ 				 * parameters per create_tidscan_path(). So don't bother to
+ 				 * translate those.
+ 				 */
+ 				FLAT_COPY_PATH(tpath, path, TidPath);
+ 				new_path = (Path *) tpath;
+ 			}
+ 			break;
+ 
+ 		case T_ForeignPath:
+ 			{
+ 				ForeignPath   *fpath;
+ 				ReparameterizeForeignPathByChild_function rfpc_func;
+ 
+ 				FLAT_COPY_PATH(fpath, path, ForeignPath);
+ 				if (fpath->fdw_outerpath)
+ 					fpath->fdw_outerpath = reparameterize_path_by_child(root,
+ 														  fpath->fdw_outerpath,
+ 																	child_rel);
+ 				rfpc_func = path->parent->fdwroutine->ReparameterizeForeignPathByChild;
+ 
+ 				/* Hand over to FDW if supported. */
+ 				if (rfpc_func)
+ 					fpath->fdw_private = rfpc_func(root, fpath->fdw_private,
+ 													child_rel);
+ 				new_path = (Path *) fpath;
+ 			}
+ 			break;
+ 
+ 		case T_CustomPath:
+ 			{
+ 				CustomPath *cpath;
+ 				ListCell   *lc;
+ 				List	   *custompaths = NIL;
+ 
+ 				FLAT_COPY_PATH(cpath, path, CustomPath);
+ 
+ 				foreach (lc, cpath->custom_paths)
+ 				{
+ 					Path   *subpath = lfirst(lc);
+ 
+ 					custompaths = lappend(custompaths,
+ 										  reparameterize_path_by_child(root,
+ 																	   subpath,
+ 																   child_rel));
+ 				}
+ 				cpath->custom_paths = custompaths;
+ 
+ 				if (cpath->methods &&
+ 					cpath->methods->ReparameterizeCustomPathByChild)
+ 					cpath->custom_private = cpath->methods->ReparameterizeCustomPathByChild(root,
+ 														 cpath->custom_private,
+ 														 child_rel);
+ 
+ 				new_path = (Path *) cpath;
+ 			}
+ 			break;
+ 
+ 		case T_NestPath:
+ 			{
+ 				JoinPath *jpath;
+ 
+ 				FLAT_COPY_PATH(jpath, path, NestPath);
+ 
+ 				jpath->outerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->outerjoinpath,
+ 														 child_rel);
+ 				jpath->innerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->innerjoinpath,
+ 														 child_rel);
+ 				jpath->joinrestrictinfo = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) jpath->joinrestrictinfo,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) jpath;
+ 			}
+ 			break;
+ 
+ 		case T_MergePath:
+ 			{
+ 				JoinPath *jpath;
+ 				MergePath  *mpath;
+ 
+ 				FLAT_COPY_PATH(mpath, path, MergePath);
+ 
+ 				jpath = (JoinPath *) mpath;
+ 				jpath->outerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->outerjoinpath,
+ 														 child_rel);
+ 				jpath->innerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->innerjoinpath,
+ 														 child_rel);
+ 				jpath->joinrestrictinfo = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) jpath->joinrestrictinfo,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				mpath->path_mergeclauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											 (Node *) mpath->path_mergeclauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) mpath;
+ 			}
+ 			break;
+ 
+ 		case T_HashPath:
+ 			{
+ 				JoinPath *jpath;
+ 				HashPath   *hpath;
+ 				FLAT_COPY_PATH(hpath, path, HashPath);
+ 
+ 				jpath = (JoinPath *) hpath;
+ 				jpath->outerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->outerjoinpath,
+ 														 child_rel);
+ 				jpath->innerjoinpath = reparameterize_path_by_child(root,
+ 														 jpath->innerjoinpath,
+ 														 child_rel);
+ 				jpath->joinrestrictinfo = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) jpath->joinrestrictinfo,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				hpath->path_hashclauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 											  (Node *) hpath->path_hashclauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) hpath;
+ 			}
+ 			break;
+ 
+ 		case T_AppendPath:
+ 			{
+ 				AppendPath	*apath;
+ 				List		*subpaths = NIL;
+ 				ListCell	*lc;
+ 
+ 				FLAT_COPY_PATH(apath, path, AppendPath);
+ 				foreach (lc, apath->subpaths)
+ 					subpaths = lappend(subpaths,
+ 									   reparameterize_path_by_child(root,
+ 																	lfirst(lc),
+ 																	child_rel));
+ 				apath->subpaths = subpaths;
+ 				new_path = (Path *) apath;
+ 			}
+ 			break;
+ 
+ 		case T_MergeAppend:
+ 			{
+ 				MergeAppendPath	*mapath;
+ 				List		*subpaths = NIL;
+ 				ListCell	*lc;
+ 
+ 				FLAT_COPY_PATH(mapath, path, MergeAppendPath);
+ 				foreach (lc, mapath->subpaths)
+ 					subpaths = lappend(subpaths,
+ 									   reparameterize_path_by_child(root,
+ 																	lfirst(lc),
+ 																	child_rel));
+ 				mapath->subpaths = subpaths;
+ 				new_path = (Path *) mapath;
+ 			}
+ 			break;
+ 
+ 		case T_MaterialPath:
+ 			{
+ 				MaterialPath *mpath;
+ 
+ 				FLAT_COPY_PATH(mpath, path, MaterialPath);
+ 				mpath->subpath = reparameterize_path_by_child(root,
+ 															  mpath->subpath,
+ 															  child_rel);
+ 				new_path = (Path *) mpath;
+ 			}
+ 			break;
+ 
+ 		case T_UniquePath:
+ 			{
+ 				UniquePath *upath;
+ 
+ 				FLAT_COPY_PATH(upath, path, UniquePath);
+ 				upath->subpath = reparameterize_path_by_child(root,
+ 															  upath->subpath,
+ 															  child_rel);
+ 				upath->uniq_exprs = (List *) adjust_appendrel_attrs_multilevel(root,
+ 													(Node *) upath->uniq_exprs,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 				new_path = (Path *) upath;
+ 			}
+ 			break;
+ 
+ 		case T_GatherPath:
+ 			{
+ 				GatherPath *gpath;
+ 
+ 				FLAT_COPY_PATH(gpath, path, GatherPath);
+ 				gpath->subpath = reparameterize_path_by_child(root,
+ 															  gpath->subpath,
+ 															  child_rel);
+ 				new_path = (Path *) gpath;
+ 			}
+ 			break;
+ 
+ 		case T_GatherMergePath:
+ 			{
+ 				GatherMergePath *gmpath;
+ 
+ 				FLAT_COPY_PATH(gmpath, path, GatherMergePath);
+ 				gmpath->subpath = reparameterize_path_by_child(root,
+ 															   gmpath->subpath,
+ 															   child_rel);
+ 				new_path = (Path *) gmpath;
+ 			}
+ 			break;
+ 
+ 		case T_SubqueryScanPath:
+ 			/*
+ 			 * Subqueries can't be partitioned right now, so a subquery can not
+ 			 * participate in a partition-wise join and hence can not be seen
+ 			 * here.
+ 			 */
+ 		case T_ResultPath:
+ 			/*
+ 			 * A result path can not have any parameterization, so we
+ 			 * should never see it here.
+ 			 */
+ 		default:
+ 			/* Other kinds of paths can not appear in a join tree. */
+ 			elog(ERROR, "unrecognized path node type %d", (int) nodeTag(path));
+ 
+ 			/* Keep compiler quite about unassigned new_path */
+ 			return NULL;
+ 	}
+ 
+ 	/*
+ 	 * Adjust the parameterization information, which refers to the topmost
+ 	 * parent. The topmost parent can be multiple levels away from the given
+ 	 * child, hence use multi-level expression adjustment routines.
+ 	 */
+ 	old_ppi = new_path->param_info;
+ 	required_outer = adjust_child_relids_multilevel(root,
+ 													old_ppi->ppi_req_outer,
+ 													child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 
+ 	/* If we already have a PPI for this parameterization, just return it */
+ 	new_ppi = find_param_path_info(new_path->parent, required_outer);
+ 
+ 	/*
+ 	 * If not, build a new one and link it to the list of PPIs. When called
+ 	 * during GEQO join planning, we are in a short-lived memory context.  We
+ 	 * must make sure that the new PPI and its contents attached to a baserel
+ 	 * survives the GEQO cycle, else the baserel is trashed for future GEQO
+ 	 * cycles.  On the other hand, when we are adding new PPI to a joinrel
+ 	 * during GEQO, we don't want that to clutter the main planning context.
+ 	 * Upshot is that the best solution is to explicitly allocate new PPI in
+ 	 * the same context the given RelOptInfo is in.
+ 	 */
+ 	if (!new_ppi)
+ 	{
+ 		MemoryContext oldcontext;
+ 		RelOptInfo   *rel = path->parent;
+ 
+ 		oldcontext = MemoryContextSwitchTo(GetMemoryChunkContext(rel));
+ 
+ 		new_ppi = makeNode(ParamPathInfo);
+ 		new_ppi->ppi_req_outer = bms_copy(required_outer);
+ 		new_ppi->ppi_rows = old_ppi->ppi_rows;
+ 		new_ppi->ppi_clauses = (List *) adjust_appendrel_attrs_multilevel(root,
+ 												 (Node *) old_ppi->ppi_clauses,
+ 															 child_rel->relids,
+ 												 child_rel->top_parent_relids);
+ 		rel->ppilist = lappend(rel->ppilist, new_ppi);
+ 
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	bms_free(required_outer);
+ 
+ 	new_path->param_info = new_ppi;
+ 
+ 	/*
+ 	 * Adjust the path target if the parent of the outer relation is referenced
+ 	 * in the targetlist. This can happen when only the parent of outer relation is
+ 	 * laterally referenced in this relation.
+ 	 */
+ 	if (bms_overlap(path->parent->lateral_relids, child_rel->top_parent_relids))
+ 	{
+ 		List	   *exprs;
+ 
+ 		new_path->pathtarget = copy_pathtarget(new_path->pathtarget);
+ 		exprs = new_path->pathtarget->exprs;
+ 		exprs = (List *) adjust_appendrel_attrs_multilevel(root,
+ 														   (Node *) exprs,
+ 														   child_rel->relids,
+ 											   child_rel->top_parent_relids);
+ 		new_path->pathtarget->exprs = exprs;
+ 	}
+ 
+ 	return new_path;
+ }
diff --git a/src/backend/optimizer/util/placeholder.c b/src/backend/optimizer/util/placeholder.c
new file mode 100644
index 698a387..6714288
*** a/src/backend/optimizer/util/placeholder.c
--- b/src/backend/optimizer/util/placeholder.c
***************
*** 20,25 ****
--- 20,26 ----
  #include "optimizer/pathnode.h"
  #include "optimizer/placeholder.h"
  #include "optimizer/planmain.h"
+ #include "optimizer/prep.h"
  #include "optimizer/var.h"
  #include "utils/lsyscache.h"
  
*************** add_placeholders_to_joinrel(PlannerInfo
*** 414,419 ****
--- 415,424 ----
  	Relids		relids = joinrel->relids;
  	ListCell   *lc;
  
+ 	/* This function is called only on the parent relations. */
+ 	Assert(!IS_OTHER_REL(joinrel) && !IS_OTHER_REL(outer_rel) &&
+ 		   !IS_OTHER_REL(inner_rel));
+ 
  	foreach(lc, root->placeholder_list)
  	{
  		PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
*************** add_placeholders_to_joinrel(PlannerInfo
*** 459,461 ****
--- 464,518 ----
  		}
  	}
  }
+ 
+ /*
+  * add_placeholders_to_child_joinrel
+  *		Translate the PHVs in parent's targetlist and add them to the child's
+  *		targetlist. Also adjust the cost
+  */
+ void
+ add_placeholders_to_child_joinrel(PlannerInfo *root, RelOptInfo *childrel,
+ 								  RelOptInfo *parentrel)
+ {
+ 	ListCell  *lc;
+ 	AppendRelInfo **appinfos;
+ 	int		nappinfos;
+ 
+ 
+ 	Assert(IS_JOIN_REL(childrel) && IS_JOIN_REL(parentrel));
+ 
+ 	/* Ensure child relations is really what it claims to be. */
+ 	Assert(IS_OTHER_REL(childrel));
+ 
+ 	appinfos = find_appinfos_by_relids(root, childrel->relids, &nappinfos);
+ 	foreach (lc, parentrel->reltarget->exprs)
+ 	{
+ 		PlaceHolderVar *phv = lfirst(lc);
+ 
+ 		if (IsA(phv, PlaceHolderVar))
+ 		{
+ 			/*
+ 			 * In case the placeholder Var refers to any of the parent
+ 			 * relations, translate it to refer to the corresponding child.
+ 			 */
+ 			if (bms_overlap(phv->phrels, parentrel->relids) &&
+ 				childrel->reloptkind == RELOPT_OTHER_JOINREL)
+ 			{
+ 				phv = (PlaceHolderVar *) adjust_appendrel_attrs(root,
+ 															  (Node *) phv,
+ 																 nappinfos,
+ 																 appinfos);
+ 			}
+ 
+ 			childrel->reltarget->exprs = lappend(childrel->reltarget->exprs,
+ 												 phv);
+ 		}
+ 	}
+ 
+ 	/* Adjust the cost and width of child targetlist. */
+ 	childrel->reltarget->cost.startup = parentrel->reltarget->cost.startup;
+ 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
+ 	childrel->reltarget->width = parentrel->reltarget->width;
+ 
+ 	pfree(appinfos);
+ }
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
new file mode 100644
index 9207c8d..7e846e1
*** a/src/backend/optimizer/util/plancat.c
--- b/src/backend/optimizer/util/plancat.c
***************
*** 27,32 ****
--- 27,33 ----
  #include "catalog/catalog.h"
  #include "catalog/dependency.h"
  #include "catalog/heap.h"
+ #include "catalog/pg_inherits_fn.h"
  #include "catalog/partition.h"
  #include "catalog/pg_am.h"
  #include "catalog/pg_statistic_ext.h"
*************** static List *get_relation_constraints(Pl
*** 68,73 ****
--- 69,80 ----
  static List *build_index_tlist(PlannerInfo *root, IndexOptInfo *index,
  				  Relation heapRelation);
  static List *get_relation_statistics(RelOptInfo *rel, Relation relation);
+ static List **build_baserel_partition_key_exprs(Relation relation,
+ 												Index varno);
+ static PartitionScheme find_partition_scheme(struct PlannerInfo *root,
+ 											 Relation rel);
+ static void get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ 							Relation relation);
  
  /*
   * get_relation_info -
*************** get_relation_info(PlannerInfo *root, Oid
*** 420,425 ****
--- 427,436 ----
  	/* Collect info about relation's foreign keys, if relevant */
  	get_relation_foreign_keys(root, rel, relation, inhparent);
  
+ 	/* Collect info about relation's partitioning scheme, if any. */
+ 	if (inhparent)
+ 		get_relation_partition_info(root, rel, relation);
+ 
  	heap_close(relation, NoLock);
  
  	/*
*************** has_row_triggers(PlannerInfo *root, Inde
*** 1801,1803 ****
--- 1812,1975 ----
  	heap_close(relation, NoLock);
  	return result;
  }
+ 
+ /*
+  * get_relation_partition_info
+  *
+  * Retrieves partitioning information for a given relation.
+  *
+  * Partitioning scheme, partition key expressions and OIDs of partitions are
+  * added to the given RelOptInfo. A partitioned table can participate in the
+  * query as a simple relation or an inheritance parent. Only the later can have
+  * child relations, and hence partitions. From the point of view of the query
+  * optimizer only such relations are considered to be partitioned. Hence
+  * partitioning information is set only for an inheritance parent.
+  */
+ static void
+ get_relation_partition_info(PlannerInfo *root, RelOptInfo *rel,
+ 							Relation relation)
+ {
+ 	PartitionDesc	part_desc = RelationGetPartitionDesc(relation);
+ 
+ 	/* No partitioning information for an unpartitioned relation. */
+ 	if (relation->rd_rel->relkind != RELKIND_PARTITIONED_TABLE ||
+ 		!(rel->part_scheme = find_partition_scheme(root, relation)))
+ 		return;
+ 
+ 	Assert(part_desc);
+ 	rel->nparts = part_desc->nparts;
+ 	rel->boundinfo = part_desc->boundinfo;
+ 	rel->partexprs = build_baserel_partition_key_exprs(relation, rel->relid);
+ 	rel->part_oids = part_desc->oids;
+ 
+ 	Assert(rel->nparts > 0 && rel->boundinfo && rel->part_oids);
+ 	return;
+ }
+ 
+ /*
+  * find_partition_scheme
+  *
+  * The function returns a canonical partition scheme which exactly matches the
+  * partitioning properties of the given relation if one exists in the of
+  * canonical partitioning schemes maintained in PlannerInfo. If none of the
+  * existing partitioning schemes match, the function creates a canonical
+  * partition scheme and adds it to the list.
+  *
+  * For an unpartitioned table or for a multi-level partitioned table it returns
+  * NULL. See comments in the function for more details.
+  */
+ static PartitionScheme
+ find_partition_scheme(PlannerInfo *root, Relation relation)
+ {
+ 	PartitionKey	part_key = RelationGetPartitionKey(relation);
+ 	ListCell	   *lc;
+ 	int		partnatts;
+ 	PartitionScheme	part_scheme = NULL;
+ 
+ 	/* No partition scheme for an unpartitioned relation. */
+ 	if (!part_key)
+ 		return NULL;
+ 
+ 	partnatts = part_key->partnatts;
+ 
+ 	/* Search for a matching partition scheme and return if found one. */
+ 	foreach (lc, root->part_schemes)
+ 	{
+ 		part_scheme = lfirst(lc);
+ 
+ 		/* Match partitioning strategy and number of keys. */
+ 		if (part_key->strategy != part_scheme->strategy ||
+ 			partnatts != part_scheme->partnatts)
+ 			continue;
+ 
+ 		/* Match the partition key types. */
+ 		if (memcmp(part_key->partopfamily, part_scheme->partopfamily,
+ 				   sizeof(Oid) * partnatts) != 0 ||
+ 			memcmp(part_key->partopcintype, part_scheme->partopcintype,
+ 				   sizeof(Oid) * partnatts) != 0 ||
+ 			memcmp(part_key->parttypcoll, part_scheme->parttypcoll,
+ 				   sizeof(Oid) * partnatts) != 0)
+ 			continue;
+ 
+ 		/* Found matching partition scheme. */
+ 		return part_scheme;
+ 	}
+ 
+ 	/* Did not find matching partition scheme. Create one. */
+ 	part_scheme = (PartitionScheme) palloc0(sizeof(PartitionSchemeData));
+ 
+ 	part_scheme->strategy = part_key->strategy;
+ 	/* Store partition key information. */
+ 	part_scheme->partnatts = part_key->partnatts;
+ 	part_scheme->partopfamily = part_key->partopfamily;
+ 	part_scheme->partopcintype = part_key->partopcintype;
+ 	part_scheme->parttypcoll = part_key->parttypcoll;
+ 	part_scheme->partsupfunc = part_key->partsupfunc;
+ 
+ 	/* Add the partitioning scheme to PlannerInfo. */
+ 	root->part_schemes = lappend(root->part_schemes, part_scheme);
+ 
+ 	return part_scheme;
+ }
+ 
+ /*
+  * build_baserel_partition_key_exprs
+  *
+  * Collect partition key expressions for a given base relation. The function
+  * converts any single column partition keys into corresponding Var nodes. It
+  * restamps Var nodes in partition key expressions by given varno. The
+  * partition key expressions are returned as an array of single element lists
+  * to be stored in RelOptInfo of the base relation.
+  */
+ static List **
+ build_baserel_partition_key_exprs(Relation relation, Index varno)
+ {
+ 	PartitionKey	part_key = RelationGetPartitionKey(relation);
+ 	int		num_pkexprs;
+ 	int		cnt_pke;
+ 	List	  **partexprs;
+ 	ListCell   *lc;
+ 
+ 	if (!part_key || part_key->partnatts <= 0)
+ 		return NULL;
+ 
+ 	num_pkexprs = part_key->partnatts;
+ 	partexprs = (List **) palloc(sizeof(List *) * num_pkexprs);
+ 	lc = list_head(part_key->partexprs);
+ 
+ 	for (cnt_pke = 0; cnt_pke < num_pkexprs; cnt_pke++)
+ 	{
+ 		AttrNumber attno = part_key->partattrs[cnt_pke];
+ 		Expr	  *pkexpr;
+ 
+ 		if (attno != InvalidAttrNumber)
+ 		{
+ 			/* Single column partition key is stored as a Var node. */
+ 			Form_pg_attribute att_tup;
+ 
+ 			if (attno < 0)
+ 				att_tup = SystemAttributeDefinition(attno,
+ 												 relation->rd_rel->relhasoids);
+ 			else
+ 				att_tup = relation->rd_att->attrs[attno - 1];
+ 
+ 			pkexpr = (Expr *) makeVar(varno, attno, att_tup->atttypid,
+ 									  att_tup->atttypmod,
+ 									  att_tup->attcollation, 0);
+ 		}
+ 		else
+ 		{
+ 			if (lc == NULL)
+ 				elog(ERROR, "wrong number of partition key expressions");
+ 
+ 			/* Re-stamp the expression with given varno. */
+ 			pkexpr = (Expr *) copyObject(lfirst(lc));
+ 			ChangeVarNodes((Node *) pkexpr, 1, varno, 0);
+ 			lc = lnext(lc);
+ 		}
+ 
+ 		partexprs[cnt_pke] = list_make1(pkexpr);
+ 	}
+ 
+ 	return partexprs;
+ }
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
new file mode 100644
index 342d884..308bdec
*** a/src/backend/optimizer/util/relnode.c
--- b/src/backend/optimizer/util/relnode.c
***************
*** 23,30 ****
--- 23,32 ----
  #include "optimizer/paths.h"
  #include "optimizer/placeholder.h"
  #include "optimizer/plancat.h"
+ #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
  #include "optimizer/tlist.h"
+ #include "optimizer/var.h"
  #include "utils/hsearch.h"
  
  
*************** typedef struct JoinHashEntry
*** 35,41 ****
  } JoinHashEntry;
  
  static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *input_rel);
  static List *build_joinrel_restrictlist(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outer_rel,
--- 37,43 ----
  } JoinHashEntry;
  
  static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 								RelOptInfo *input_rel, bool grouped);
  static List *build_joinrel_restrictlist(PlannerInfo *root,
  						   RelOptInfo *joinrel,
  						   RelOptInfo *outer_rel,
*************** static List *subbuild_joinrel_joinlist(R
*** 52,57 ****
--- 54,64 ----
  static void set_foreign_rel_properties(RelOptInfo *joinrel,
  						   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
  static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
+ extern ParamPathInfo *find_param_path_info(RelOptInfo *rel,
+ 									  Relids required_outer);
+ static void build_joinrel_partition_info(RelOptInfo *joinrel,
+ 							 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+ 							 List *restrictlist, JoinType jointype);
  
  
  /*
*************** build_simple_rel(PlannerInfo *root, int
*** 120,125 ****
--- 127,133 ----
  	rel->cheapest_parameterized_paths = NIL;
  	rel->direct_lateral_relids = NULL;
  	rel->lateral_relids = NULL;
+ 	rel->gpi = NULL;
  	rel->relid = relid;
  	rel->rtekind = rte->rtekind;
  	/* min_attr, max_attr, attr_needed, attr_widths are set below */
*************** build_simple_rel(PlannerInfo *root, int
*** 146,151 ****
--- 154,164 ----
  	rel->baserestrict_min_security = UINT_MAX;
  	rel->joininfo = NIL;
  	rel->has_eclass_joins = false;
+ 	rel->part_scheme = NULL;
+ 	rel->nparts = 0;
+ 	rel->boundinfo = NULL;
+ 	rel->partexprs = NULL;
+ 	rel->part_rels = NULL;
  
  	/*
  	 * Pass top parent's relids down the inheritance hierarchy. If the parent
*************** build_simple_rel(PlannerInfo *root, int
*** 218,237 ****
  	if (rte->inh)
  	{
  		ListCell   *l;
  
  		foreach(l, root->append_rel_list)
  		{
  			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
  
  			/* append_rel_list contains all append rels; ignore others */
  			if (appinfo->parent_relid != relid)
  				continue;
  
! 			(void) build_simple_rel(root, appinfo->child_relid,
! 									rel);
  		}
  	}
  
  	return rel;
  }
  
--- 231,293 ----
  	if (rte->inh)
  	{
  		ListCell   *l;
+ 		int			nparts = rel->nparts;
+ 
+ 		if (nparts > 0)
+ 			rel->part_rels = (RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
  
  		foreach(l, root->append_rel_list)
  		{
  			AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
+ 			RelOptInfo *childrel;
+ 			int			cnt_parts;
+ 			RangeTblEntry *childRTE;
  
  			/* append_rel_list contains all append rels; ignore others */
  			if (appinfo->parent_relid != relid)
  				continue;
  
! 			childrel = build_simple_rel(root, appinfo->child_relid,
! 										 rel);
! 
! 			/* Nothing more to do for an unpartitioned table. */
! 			if (!rel->part_scheme)
! 				continue;
! 
! 			childRTE = root->simple_rte_array[appinfo->child_relid];
! 			/*
! 			 * Two partitioned tables with the same partitioning scheme, have
! 			 * their partition bounds arranged in the same order. The order of
! 			 * partition OIDs in RelOptInfo corresponds to the partition bound
! 			 * order. Thus the OIDs of matching partitions from both the tables
! 			 * are placed at the same position in the array of partition OIDs
! 			 * in the respective RelOptInfos. Arranging RelOptInfos of
! 			 * partitions in the same order as their OIDs makes it easy to find
! 			 * the RelOptInfos of matching partitions for partition-wise join.
! 			 */
! 			for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
! 			{
! 				if (rel->part_oids[cnt_parts] == childRTE->relid)
! 				{
! 					Assert(!rel->part_rels[cnt_parts]);
! 					rel->part_rels[cnt_parts] = childrel;
! 					break;
! 				}
! 			}
  		}
  	}
  
+ 	/* Should have found all the childrels of a partitioned relation. */
+ 	if (rel->part_scheme)
+ 	{
+ 		int		cnt_parts;
+ 
+ 		for (cnt_parts = 0; cnt_parts < rel->nparts; cnt_parts++)
+ 			if (!rel->part_rels[cnt_parts])
+ 				elog(ERROR, "could not find the RelOptInfo of a partition with oid %u",
+ 					 rel->part_oids[cnt_parts]);
+ 	}
+ 
  	return rel;
  }
  
*************** build_join_rel(PlannerInfo *root,
*** 453,458 ****
--- 509,517 ----
  	RelOptInfo *joinrel;
  	List	   *restrictlist;
  
+ 	/* This function should be used only for join between parents. */
+ 	Assert(!IS_OTHER_REL(outer_rel) && !IS_OTHER_REL(inner_rel));
+ 
  	/*
  	 * See if we already have a joinrel for this set of base rels.
  	 */
*************** build_join_rel(PlannerInfo *root,
*** 497,502 ****
--- 556,562 ----
  				  inner_rel->direct_lateral_relids);
  	joinrel->lateral_relids = min_join_parameterization(root, joinrel->relids,
  														outer_rel, inner_rel);
+ 	joinrel->gpi = NULL;
  	joinrel->relid = 0;			/* indicates not a baserel */
  	joinrel->rtekind = RTE_JOIN;
  	joinrel->min_attr = 0;
*************** build_join_rel(PlannerInfo *root,
*** 527,532 ****
--- 587,597 ----
  	joinrel->joininfo = NIL;
  	joinrel->has_eclass_joins = false;
  	joinrel->top_parent_relids = NULL;
+ 	joinrel->part_scheme = NULL;
+ 	joinrel->nparts = 0;
+ 	joinrel->boundinfo = NULL;
+ 	joinrel->partexprs = NULL;
+ 	joinrel->part_rels = NULL;
  
  	/* Compute information relevant to the foreign relations. */
  	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
*************** build_join_rel(PlannerInfo *root,
*** 539,548 ****
  	 * and inner rels we first try to build it from.  But the contents should
  	 * be the same regardless.
  	 */
! 	build_joinrel_tlist(root, joinrel, outer_rel);
! 	build_joinrel_tlist(root, joinrel, inner_rel);
  	add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
  
  	/*
  	 * add_placeholders_to_joinrel also took care of adding the ph_lateral
  	 * sets of any PlaceHolderVars computed here to direct_lateral_relids, so
--- 604,620 ----
  	 * and inner rels we first try to build it from.  But the contents should
  	 * be the same regardless.
  	 */
! 	build_joinrel_tlist(root, joinrel, outer_rel, false);
! 	build_joinrel_tlist(root, joinrel, inner_rel, false);
  	add_placeholders_to_joinrel(root, joinrel, outer_rel, inner_rel);
  
+ 	/* Try to build grouped target. */
+ 	/*
+ 	 * TODO Consider if placeholders make sense here. If not, also make the
+ 	 * related code below conditional.
+ 	 */
+ 	prepare_rel_for_grouping(root, joinrel);
+ 
  	/*
  	 * add_placeholders_to_joinrel also took care of adding the ph_lateral
  	 * sets of any PlaceHolderVars computed here to direct_lateral_relids, so
*************** build_join_rel(PlannerInfo *root,
*** 572,577 ****
--- 644,653 ----
  	 */
  	joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);
  
+ 	/* Store the partition information. */
+ 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
+ 								 sjinfo->jointype);
+ 
  	/*
  	 * Set estimates of the joinrel's size.
  	 */
*************** build_join_rel(PlannerInfo *root,
*** 617,622 ****
--- 693,845 ----
  	return joinrel;
  }
  
+  /*
+  * build_child_join_rel
+  *		Builds RelOptInfo for joining given two child relations from RelOptInfo
+  *		representing the join between their parents.
+  *
+  * 'outer_rel' and 'inner_rel' are the RelOptInfos of child relations being
+  *		joined.
+  * 'parent_joinrel' is the RelOptInfo representing the join between parent
+  *		relations. Most of the members of new RelOptInfo are produced by
+  *		translating corresponding members of this RelOptInfo.
+  * 'sjinfo': context info for child join
+  * 'restrictlist': list of RestrictInfo nodes that apply to this particular
+  *		pair of joinable relations.
+  * 'join_appinfos': list of AppendRelInfo nodes for base child relations involved
+  *		in this join.
+  */
+ RelOptInfo *
+ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
+ 					 RelOptInfo *inner_rel, RelOptInfo *parent_joinrel,
+ 					 List *restrictlist, SpecialJoinInfo *sjinfo,
+ 					 JoinType jointype)
+ {
+ 	RelOptInfo *joinrel = makeNode(RelOptInfo);
+ 	AppendRelInfo **appinfos;
+ 	int		nappinfos;
+ 
+ 	/* Only joins between other relations land here. */
+ 	Assert(IS_OTHER_REL(outer_rel) && IS_OTHER_REL(inner_rel));
+ 
+ 	joinrel->reloptkind = RELOPT_OTHER_JOINREL;
+ 	joinrel->relids = bms_union(outer_rel->relids, inner_rel->relids);
+ 	joinrel->rows = 0;
+ 	/* cheap startup cost is interesting iff not all tuples to be retrieved */
+ 	joinrel->consider_startup = (root->tuple_fraction > 0);
+ 	joinrel->consider_param_startup = false;
+ 	joinrel->consider_parallel = false;
+ 	joinrel->reltarget = create_empty_pathtarget();
+ 	joinrel->pathlist = NIL;
+ 	joinrel->ppilist = NIL;
+ 	joinrel->partial_pathlist = NIL;
+ 	joinrel->cheapest_startup_path = NULL;
+ 	joinrel->cheapest_total_path = NULL;
+ 	joinrel->cheapest_unique_path = NULL;
+ 	joinrel->cheapest_parameterized_paths = NIL;
+ 	joinrel->direct_lateral_relids = NULL;
+ 	joinrel->lateral_relids = NULL;
+ 	joinrel->gpi = makeNode(GroupedPathInfo);
+ 	if (parent_joinrel->gpi)
+ 		/*
+ 		 * Translation into child varnos will take place along with other
+ 		 * translations, see try_partition_wise_join.
+ 		 */
+ 		joinrel->gpi->target = copy_pathtarget(parent_joinrel->gpi->target);
+ 	joinrel->relid = 0;			/* indicates not a baserel */
+ 	joinrel->rtekind = RTE_JOIN;
+ 	joinrel->min_attr = 0;
+ 	joinrel->max_attr = 0;
+ 	joinrel->attr_needed = NULL;
+ 	joinrel->attr_widths = NULL;
+ 	joinrel->lateral_vars = NIL;
+ 	joinrel->lateral_referencers = NULL;
+ 	joinrel->indexlist = NIL;
+ 	joinrel->pages = 0;
+ 	joinrel->tuples = 0;
+ 	joinrel->allvisfrac = 0;
+ 	joinrel->subroot = NULL;
+ 	joinrel->subplan_params = NIL;
+ 	joinrel->serverid = InvalidOid;
+ 	joinrel->userid = InvalidOid;
+ 	joinrel->useridiscurrent = false;
+ 	joinrel->fdwroutine = NULL;
+ 	joinrel->fdw_private = NULL;
+ 	joinrel->baserestrictinfo = NIL;
+ 	joinrel->baserestrictcost.startup = 0;
+ 	joinrel->baserestrictcost.per_tuple = 0;
+ 	joinrel->joininfo = NIL;
+ 	joinrel->has_eclass_joins = false;
+ 	joinrel->top_parent_relids = NULL;
+ 	joinrel->part_scheme = NULL;
+ 	joinrel->part_rels = NULL;
+ 	joinrel->partexprs = NULL;
+ 
+ 	joinrel->top_parent_relids = bms_union(outer_rel->top_parent_relids,
+ 										   inner_rel->top_parent_relids);
+ 
+ 	/* Compute information relevant to foreign relations. */
+ 	set_foreign_rel_properties(joinrel, outer_rel, inner_rel);
+ 
+ 	/* Build targetlist */
+ 	build_joinrel_tlist(root, joinrel, outer_rel, false);
+ 	build_joinrel_tlist(root, joinrel, inner_rel, false);
+ 	/* Add placeholder variables. */
+ 	add_placeholders_to_child_joinrel(root, joinrel, parent_joinrel);
+ 
+ 	/* Try to build grouped target. */
+ 	/*
+ 	 * TODO Consider if placeholders make sense here. If not, also make the
+ 	 * related code below conditional.
+ 	 */
+ 	prepare_rel_for_grouping(root, joinrel);
+ 
+ 
+ 	/* Construct joininfo list. */
+ 	appinfos = find_appinfos_by_relids(root, joinrel->relids, &nappinfos);
+ 	joinrel->joininfo = (List *) adjust_appendrel_attrs(root,
+ 											 (Node *) parent_joinrel->joininfo,
+ 																	 nappinfos,
+ 																	 appinfos);
+ 	pfree(appinfos);
+ 
+ 	/*
+ 	 * Lateral relids referred in child join will be same as that referred in
+ 	 * the parent relation. Throw any partial result computed while building
+ 	 * the targetlist.
+ 	 */
+ 	bms_free(joinrel->direct_lateral_relids);
+ 	bms_free(joinrel->lateral_relids);
+ 	joinrel->direct_lateral_relids = (Relids) bms_copy(parent_joinrel->direct_lateral_relids);
+ 	joinrel->lateral_relids = (Relids) bms_copy(parent_joinrel->lateral_relids);
+ 
+ 	/*
+ 	 * If the parent joinrel has pending equivalence classes, so does the
+ 	 * child.
+ 	 */
+ 	joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;
+ 
+ 	/* Is the join between partitions itself partitioned? */
+ 	build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
+ 								 jointype);
+ 
+ 	/* Child joinrel is parallel safe if parent is parallel safe. */
+ 	joinrel->consider_parallel = parent_joinrel->consider_parallel;
+ 
+ 
+ 	/* Set estimates of the child-joinrel's size. */
+ 	set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
+ 							   sjinfo, restrictlist);
+ 
+ 	/* We build the join only once. */
+ 	Assert(!find_join_rel(root, joinrel->relids));
+ 
+ 	/* Add the relation to the PlannerInfo. */
+ 	add_join_rel(root, joinrel);
+ 
+ 	return joinrel;
+ }
+ 
  /*
   * min_join_parameterization
   *
*************** min_join_parameterization(PlannerInfo *r
*** 670,679 ****
   */
  static void
  build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *input_rel)
  {
! 	Relids		relids = joinrel->relids;
  	ListCell   *vars;
  
  	foreach(vars, input_rel->reltarget->exprs)
  	{
--- 893,932 ----
   */
  static void
  build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
! 					RelOptInfo *input_rel, bool grouped)
  {
! 	Relids		relids;
! 	PathTarget  *input_target, *result;
  	ListCell   *vars;
+ 	int			i = -1;
+ 
+ 	/* attrs_needed refers to parent relids and not those of a child. */
+ 	if (joinrel->top_parent_relids)
+ 		relids = joinrel->top_parent_relids;
+ 	else
+ 		relids = joinrel->relids;
+ 
+  	if (!grouped)
+  	{
+  		input_target = input_rel->reltarget;
+  		result = joinrel->reltarget;
+  	}
+  	else
+  	{
+  		if (input_rel->gpi != NULL)
+  		{
+  			input_target = input_rel->gpi->target;
+  			Assert(input_target != NULL);
+  		}
+  		else
+  			input_target = input_rel->reltarget;
+ 
+  		/* Caller should have initialized this. */
+  		Assert(joinrel->gpi != NULL);
+ 
+  		/* Default to the plain target. */
+  		result = joinrel->gpi->target;
+  	}
  
  	foreach(vars, input_rel->reltarget->exprs)
  	{
*************** build_joinrel_tlist(PlannerInfo *root, R
*** 690,713 ****
  
  		/*
  		 * Otherwise, anything in a baserel or joinrel targetlist ought to be
! 		 * a Var.  (More general cases can only appear in appendrel child
! 		 * rels, which will never be seen here.)
  		 */
! 		if (!IsA(var, Var))
  			elog(ERROR, "unexpected node type in rel targetlist: %d",
  				 (int) nodeTag(var));
  
- 		/* Get the Var's original base rel */
- 		baserel = find_base_rel(root, var->varno);
- 
- 		/* Is it still needed above this joinrel? */
- 		ndx = var->varattno - baserel->min_attr;
  		if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
  		{
  			/* Yup, add it to the output */
! 			joinrel->reltarget->exprs = lappend(joinrel->reltarget->exprs, var);
! 			/* Vars have cost zero, so no need to adjust reltarget->cost */
! 			joinrel->reltarget->width += baserel->attr_widths[ndx];
  		}
  	}
  }
--- 943,1009 ----
  
  		/*
  		 * Otherwise, anything in a baserel or joinrel targetlist ought to be
! 		 * a Var or ConvertRowtypeExpr introduced while translating parent
! 		 * targetlist to that of the child.
  		 */
! 		if (IsA(var, Var))
! 		{
! 			/* Get the Var's original base rel */
! 			baserel = find_base_rel(root, var->varno);
! 
! 			/* Is it still needed above this joinrel? */
! 			ndx = var->varattno - baserel->min_attr;
! 		}
! 		else if (IsA(var, ConvertRowtypeExpr))
! 		{
! 			ConvertRowtypeExpr *child_expr = (ConvertRowtypeExpr *) var;
! 			Var	 *childvar = (Var *) child_expr->arg;
! 
! 			/*
! 			 * Child's whole-row references are converted to that of parent
! 			 * using ConvertRowtypeExpr. There can be as many
! 			 * ConvertRowtypeExpr decorations as the depth of partition tree.
! 			 * The argument to deepest ConvertRowtypeExpr is expected to be a
! 			 * whole-row reference of the child.
! 			 */
! 			while (IsA(childvar, ConvertRowtypeExpr))
! 			{
! 				child_expr = (ConvertRowtypeExpr *) childvar;
! 				childvar = (Var *) child_expr->arg;
! 			}
! 			Assert(IsA(childvar, Var) && childvar->varattno == 0);
! 
! 			baserel = find_base_rel(root, childvar->varno);
! 			ndx = 0 - baserel->min_attr;
! 		}
! 		else
  			elog(ERROR, "unexpected node type in rel targetlist: %d",
  				 (int) nodeTag(var));
  
  		if (bms_nonempty_difference(baserel->attr_needed[ndx], relids))
  		{
+ 			Index sortgroupref = 0;
+ 
  			/* Yup, add it to the output */
! 			if (input_target->sortgrouprefs)
! 				sortgroupref = input_target->sortgrouprefs[i];
! 
! 			/*
! 			 * Even if not used for grouping in the input path (the input path
! 			 * is not necessarily grouped), it might be useful for grouping
! 			 * higher in the join tree.
! 			 */
! 			if (sortgroupref == 0)
! 				sortgroupref = get_expr_sortgroupref(root, (Expr *) var);
! 
! 			add_column_to_pathtarget(result, (Expr *) var, sortgroupref);
! 
! 			/*
! 			 * Vars have cost zero, so no need to adjust reltarget->cost. Even
! 			 * if, it's a ConvertRowtypeExpr, it will be computed only for the
! 			 * base relation, costing nothing for a join.
! 			 */
! 			result->width += baserel->attr_widths[ndx];
  		}
  	}
  }
*************** subbuild_joinrel_joinlist(RelOptInfo *jo
*** 843,848 ****
--- 1139,1147 ----
  {
  	ListCell   *l;
  
+ 	/* Expected to be called only for join between parent relations. */
+ 	Assert(joinrel->reloptkind == RELOPT_JOINREL);
+ 
  	foreach(l, joininfo_list)
  	{
  		RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
*************** get_baserel_parampathinfo(PlannerInfo *r
*** 1048,1059 ****
  	Assert(!bms_overlap(baserel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	foreach(lc, baserel->ppilist)
! 	{
! 		ppi = (ParamPathInfo *) lfirst(lc);
! 		if (bms_equal(ppi->ppi_req_outer, required_outer))
! 			return ppi;
! 	}
  
  	/*
  	 * Identify all joinclauses that are movable to this base rel given this
--- 1347,1354 ----
  	Assert(!bms_overlap(baserel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	if ((ppi = find_param_path_info(baserel, required_outer)))
! 		return ppi;
  
  	/*
  	 * Identify all joinclauses that are movable to this base rel given this
*************** get_baserel_parampathinfo(PlannerInfo *r
*** 1095,1100 ****
--- 1390,1545 ----
  }
  
  /*
+  * If the relation can produce grouped paths, create GroupedPathInfo for it
+  * and create target for the grouped paths.
+  */
+ void
+ prepare_rel_for_grouping(PlannerInfo *root, RelOptInfo *rel)
+ {
+ 	List	*rel_aggregates;
+ 	Relids	rel_agg_attrs = NULL;
+ 	List	*rel_agg_vars = NIL;
+ 	bool	found_higher;
+ 	ListCell	*lc;
+ 	PathTarget	*target_grouped;
+ 
+ 	if (rel->relid > 0)
+ 	{
+ 		RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+ 
+ 		/*
+ 		 * rtekind != RTE_RELATION case is not supported yet.
+ 		 */
+ 		if (rte->rtekind != RTE_RELATION)
+ 			return;
+ 	}
+ 
+ 	/* Caller should only pass base relations or joins. */
+ 	Assert(rel->reloptkind == RELOPT_BASEREL ||
+ 		   rel->reloptkind == RELOPT_JOINREL ||
+ 		   rel->reloptkind == RELOPT_OTHER_JOINREL);
+ 
+ 	/*
+ 	 * If any outer join can set the attribute value to NULL, the aggregate
+ 	 * would receive different input at the base rel level.
+ 	 *
+ 	 * TODO For RELOPT_JOINREL, do not return if all the joins that can set
+ 	 * any entry of the grouped target (do we need to postpone this check
+ 	 * until the grouped target is available, and should create_grouped_target
+ 	 * take care?) of this rel to NULL are provably below rel. (It's ok if rel
+ 	 * is one of these joins.)
+ 	 */
+ 	if (bms_overlap(rel->relids, root->nullable_baserels))
+ 		return;
+ 
+ 	/*
+ 	 * Check if some aggregates can be evaluated in this relation's target,
+ 	 * and collect all vars referenced by these aggregates.
+ 	 */
+ 	rel_aggregates = NIL;
+ 	found_higher = false;
+ 	foreach(lc, root->grouped_var_list)
+ 	{
+ 		GroupedVarInfo	*gvi = castNode(GroupedVarInfo, lfirst(lc));
+ 
+ 		/*
+ 		 * The subset includes gv_eval_at uninitialized, which typically means
+ 		 * Aggref.aggstar.
+ 		 */
+ 		if (bms_is_subset(gvi->gv_eval_at, rel->relids))
+ 		{
+ 			Aggref	*aggref = castNode(Aggref, gvi->gvexpr);
+ 
+ 			/*
+ 			 * Accept the aggregate.
+ 			 *
+ 			 * GroupedVarInfo is more convenient for the next processing than
+ 			 * Aggref, see add_aggregates_to_grouped_target.
+ 			 */
+ 			rel_aggregates = lappend(rel_aggregates, gvi);
+ 
+ 			if (rel->relid > 0)
+ 			{
+ 				/*
+ 				 * Simple relation. Collect attributes referenced by the
+ 				 * aggregate arguments.
+ 				 */
+ 				pull_varattnos((Node *) aggref, rel->relid, &rel_agg_attrs);
+ 			}
+ 			else
+ 			{
+ 				List	*agg_vars;
+ 
+ 				/*
+ 				 * Join. Collect vars referenced by the aggregate
+ 				 * arguments.
+ 				 */
+ 				/*
+ 				 * TODO Can any argument contain PHVs? And if so, does it matter?
+ 				 * Consider PVC_INCLUDE_PLACEHOLDERS | PVC_RECURSE_PLACEHOLDERS.
+ 				 */
+ 				agg_vars = pull_var_clause((Node *) aggref,
+ 										   PVC_RECURSE_AGGREGATES);
+ 				rel_agg_vars = list_concat(rel_agg_vars, agg_vars);
+ 			}
+ 		}
+ 		else if (bms_overlap(gvi->gv_eval_at, rel->relids))
+ 		{
+ 			/*
+ 			 * Remember that there is at least one aggregate that needs more
+ 			 * than this rel.
+ 			 */
+ 			found_higher = true;
+ 		}
+ 	}
+ 
+ 	/*
+ 	 * Grouping makes little sense w/o aggregate function.
+ 	 */
+ 	if (rel_aggregates == NIL)
+ 	{
+ 		bms_free(rel_agg_attrs);
+ 		return;
+ 	}
+ 
+ 	if (found_higher)
+ 	{
+ 		/*
+ 		 * If some aggregate(s) need only this rel but some other need
+ 		 * multiple relations including the the current one, grouping of the
+ 		 * current rel could steal some input variables from the "higher
+ 		 * aggregate" (besides decreasing the number of input rows).
+ 		 */
+ 		list_free(rel_aggregates);
+ 		bms_free(rel_agg_attrs);
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * If rel->reltarget can be used for aggregation, mark the relation as
+ 	 * capable of grouping.
+ 	 */
+ 	Assert(rel->gpi == NULL);
+ 	target_grouped = create_grouped_target(root, rel, rel_agg_attrs,
+ 										   rel_agg_vars);
+ 	if (target_grouped != NULL)
+ 	{
+ 		GroupedPathInfo	*gpi;
+ 
+ 		gpi = makeNode(GroupedPathInfo);
+ 		gpi->target = copy_pathtarget(target_grouped);
+ 		gpi->pathlist = NIL;
+ 		gpi->partial_pathlist = NIL;
+ 		rel->gpi = gpi;
+ 
+ 		/*
+ 		 * Add aggregates (in the form of GroupedVar) to the target.
+ 		 */
+ 		add_aggregates_to_target(root, gpi->target, rel_aggregates, rel);
+ 	}
+ }
+ 
+ /*
   * get_joinrel_parampathinfo
   *		Get the ParamPathInfo for a parameterized path for a join relation,
   *		constructing one if we don't have one already.
*************** get_joinrel_parampathinfo(PlannerInfo *r
*** 1290,1301 ****
  	*restrict_clauses = list_concat(pclauses, *restrict_clauses);
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	foreach(lc, joinrel->ppilist)
! 	{
! 		ppi = (ParamPathInfo *) lfirst(lc);
! 		if (bms_equal(ppi->ppi_req_outer, required_outer))
! 			return ppi;
! 	}
  
  	/* Estimate the number of rows returned by the parameterized join */
  	rows = get_parameterized_joinrel_size(root, joinrel,
--- 1735,1742 ----
  	*restrict_clauses = list_concat(pclauses, *restrict_clauses);
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	if ((ppi = find_param_path_info(joinrel, required_outer)))
! 		return ppi;
  
  	/* Estimate the number of rows returned by the parameterized join */
  	rows = get_parameterized_joinrel_size(root, joinrel,
*************** ParamPathInfo *
*** 1334,1340 ****
  get_appendrel_parampathinfo(RelOptInfo *appendrel, Relids required_outer)
  {
  	ParamPathInfo *ppi;
- 	ListCell   *lc;
  
  	/* Unparameterized paths have no ParamPathInfo */
  	if (bms_is_empty(required_outer))
--- 1775,1780 ----
*************** get_appendrel_parampathinfo(RelOptInfo *
*** 1343,1354 ****
  	Assert(!bms_overlap(appendrel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	foreach(lc, appendrel->ppilist)
! 	{
! 		ppi = (ParamPathInfo *) lfirst(lc);
! 		if (bms_equal(ppi->ppi_req_outer, required_outer))
! 			return ppi;
! 	}
  
  	/* Else build the ParamPathInfo */
  	ppi = makeNode(ParamPathInfo);
--- 1783,1790 ----
  	Assert(!bms_overlap(appendrel->relids, required_outer));
  
  	/* If we already have a PPI for this parameterization, just return it */
! 	if ((ppi = find_param_path_info(appendrel, required_outer)))
! 		return ppi;
  
  	/* Else build the ParamPathInfo */
  	ppi = makeNode(ParamPathInfo);
*************** get_appendrel_parampathinfo(RelOptInfo *
*** 1359,1361 ****
--- 1795,1917 ----
  
  	return ppi;
  }
+ 
+ /*
+  * Returns a ParamPathInfo for outer relations specified by required_outer, if
+  * already available in the given rel. Returns NULL otherwise.
+  */
+ ParamPathInfo *
+ find_param_path_info(RelOptInfo *rel, Relids required_outer)
+ {
+ 	ListCell   *lc;
+ 
+ 	foreach(lc, rel->ppilist)
+ 	{
+ 		ParamPathInfo  *ppi = (ParamPathInfo *) lfirst(lc);
+ 		if (bms_equal(ppi->ppi_req_outer, required_outer))
+ 			return ppi;
+ 	}
+ 
+ 	return NULL;
+ }
+ 
+ /*
+  * build_joinrel_partition_info
+  *		If the join between given partitioned relations is possibly partitioned
+  *		set the partitioning scheme and partition keys expressions for the
+  *		join.
+  *
+  * If the two relations have same partitioning scheme, their join may be
+  * partitioned and will follow the same partitioning scheme as the joining
+  * relations.
+  */
+ static void
+ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
+ 							 RelOptInfo *inner_rel, List *restrictlist,
+ 							 JoinType jointype)
+ {
+ 	int		num_pks;
+ 	int		cnt;
+ 	bool	is_strict;
+ 
+ 	/* Nothing to do if partition-wise join technique is disabled. */
+ 	if (!enable_partition_wise_join)
+ 	{
+ 		joinrel->part_scheme = NULL;
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * The join is not partitioned, if any of the relations being joined are
+ 	 * not partitioned or they do not have same partitioning scheme or if there
+ 	 * is no equi-join between partition keys.
+ 	 *
+ 	 * For an N-way inner join, where every syntactic inner join has equi-join
+ 	 * between partition keys and a matching partitioning scheme, partition
+ 	 * keys of N relations form an equivalence class, thus inducing an
+ 	 * equi-join between any pair of joining relations.
+ 	 *
+ 	 * For an N-way join with outer joins, where every syntactic join has an
+ 	 * equi-join between partition keys and a matching partitioning scheme,
+ 	 * outer join reordering identities in optimizer/README imply that only
+ 	 * those pairs of join are legal which have an equi-join between partition
+ 	 * keys. Thus every pair of joining relations we see here should have an
+ 	 * equi-join if this join has been deemed as a partitioned join.
+ 	 */
+ 	if (!outer_rel->part_scheme || !inner_rel->part_scheme ||
+ 		outer_rel->part_scheme != inner_rel->part_scheme ||
+ 		!have_partkey_equi_join(outer_rel, inner_rel, jointype, restrictlist,
+ 								&is_strict))
+ 	{
+ 		joinrel->part_scheme = NULL;
+ 		return;
+ 	}
+ 
+ 	/*
+ 	 * This function will be called only once for each joinrel, hence it should
+ 	 * not have partition scheme, partition key expressions and array for
+ 	 * storing child relations set.
+ 	 */
+ 	Assert(!joinrel->part_scheme && !joinrel->partexprs &&
+ 		   !joinrel->part_rels);
+ 
+ 	/*
+ 	 * Join relation is partitioned using same partitioning scheme as the
+ 	 * joining relations.
+ 	 */
+ 	joinrel->part_scheme = outer_rel->part_scheme;
+ 	num_pks = joinrel->part_scheme->partnatts;
+ 
+ 	/*
+ 	 * Construct partition keys for the join.
+ 	 *
+ 	 * An INNER join between two partitioned relations is partition by key
+ 	 * expressions from both the relations. For tables A and B partitioned by a
+ 	 * and b respectively, (A INNER JOIN B ON A.a = B.b) is partitioned by both
+ 	 * A.a and B.b.
+ 	 *
+ 	 * An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
+ 	 * B.b NULL. These rows may not fit the partitioning conditions imposed on
+ 	 * B.b. Hence, strictly speaking, the join is not partitioned by B.b.
+ 	 * Strictly speaking, partition keys of an OUTER join should include
+ 	 * partition key expressions from the OUTER side only. Consider a join like
+ 	 * (A LEFT JOIN B on (A.a = B.b) LEFT JOIN C ON B.b = C.c. If we do not
+ 	 * include B.b as partition key expression for (AB), it prohibits us from
+ 	 * using partition-wise join when joining (AB) with C as there is no
+ 	 * equi-join between partition keys of joining relations. If the equality
+ 	 * operator is strict, two NULL values are never equal and no two rows from
+ 	 * mis-matching partitions can join. Hence if the equality operator is
+ 	 * strict it's safe to include B.b as partition key expression for (AB),
+ 	 * even though rows in (AB) are not strictly partitioned by B.b.
+ 	 */
+ 	joinrel->partexprs = (List **) palloc0(sizeof(List *) * num_pks);
+ 	for (cnt = 0; cnt < num_pks; cnt++)
+ 	{
+ 		List *pkexpr = list_copy(outer_rel->partexprs[cnt]);
+ 
+ 		if (jointype == JOIN_INNER || is_strict)
+ 			pkexpr = list_concat(pkexpr,
+ 								 list_copy(inner_rel->partexprs[cnt]));
+ 		joinrel->partexprs[cnt] = pkexpr;
+ 	}
+ }
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
new file mode 100644
index 0952385..dd962b7
*** a/src/backend/optimizer/util/tlist.c
--- b/src/backend/optimizer/util/tlist.c
*************** get_sortgrouplist_exprs(List *sgClauses,
*** 408,413 ****
--- 408,487 ----
  	return result;
  }
  
+ /*
+  * get_sortgrouplist_clauses
+  *
+  *		Given a "grouped target" (i.e. target where each non-GroupedVar
+  *		element must have sortgroupref set), build a list of the referencing
+  *		SortGroupClauses, a list of the corresponding grouping expressions and
+  *		a list of aggregate expressions.
+  */
+ /* Refine the function name. */
+ void
+ get_grouping_expressions(PlannerInfo *root, PathTarget *target,
+ 						 List **grouping_clauses, List **grouping_exprs,
+ 						 List **agg_exprs)
+ {
+ 	ListCell   *l;
+ 	int		i = 0;
+ 
+ 	foreach(l, target->exprs)
+ 	{
+ 		Index	sortgroupref = 0;
+ 		SortGroupClause *cl;
+ 		Expr		*texpr;
+ 
+ 		texpr = (Expr *) lfirst(l);
+ 
+ 		/* The target should contain at least one grouping column. */
+ 		Assert(target->sortgrouprefs != NULL);
+ 
+ 		if (IsA(texpr, GroupedVar))
+ 		{
+ 			/*
+ 			 * texpr should represent the first aggregate in the targetlist.
+ 			 */
+ 			break;
+ 		}
+ 
+ 		/*
+ 		 * Find the clause by sortgroupref.
+ 		 */
+ 		sortgroupref = target->sortgrouprefs[i++];
+ 
+ 		/*
+ 		 * Besides aggregates, the target should contain no expressions w/o
+ 		 * sortgroupref. Plain relation being joined to grouped can have
+ 		 * sortgroupref equal to zero for expressions contained neither in
+ 		 * grouping expression nor in aggregate arguments, but if the target
+ 		 * contains such an expression, it shouldn't be used for aggregation
+ 		 * --- see can_aggregate field of GroupedPathInfo.
+ 		 */
+ 		Assert(sortgroupref > 0);
+ 
+ 		cl = get_sortgroupref_clause(sortgroupref, root->parse->groupClause);
+ 		*grouping_clauses = list_append_unique(*grouping_clauses, cl);
+ 
+ 		/*
+ 		 * Add only unique clauses because of joins (both sides of a join can
+ 		 * point at the same grouping clause). XXX Is it worth adding a bool
+ 		 * argument indicating that we're dealing with join right now?
+ 		 */
+ 		*grouping_exprs = list_append_unique(*grouping_exprs, texpr);
+ 	}
+ 
+ 	/* Now collect the aggregates. */
+ 	while (l != NULL)
+ 	{
+ 		GroupedVar	*gvar = castNode(GroupedVar, lfirst(l));
+ 
+ 		/* Currently, GroupedVarInfo can only represent aggregate. */
+ 		Assert(gvar->agg_partial != NULL);
+ 		*agg_exprs = lappend(*agg_exprs, gvar->agg_partial);
+ 		l = lnext(l);
+ 	}
+ }
+ 
  
  /*****************************************************************************
   *		Functions to extract data from a list of SortGroupClauses
*************** apply_pathtarget_labeling_to_tlist(List
*** 783,788 ****
--- 857,1081 ----
  }
  
  /*
+  * Replace each "grouped var" in the source targetlist with the original
+  * expression.
+  *
+  * TODO Think of more suitable name. Although "grouped var" may substitute for
+  * grouping expressions in the future, currently Aggref is the only outcome of
+  * the replacement. undo_grouped_var_substitutions?
+  */
+ List *
+ restore_grouping_expressions(PlannerInfo *root, List *src)
+ {
+ 	List	*result = NIL;
+ 	ListCell	*l;
+ 
+ 	foreach(l, src)
+ 	{
+ 		TargetEntry	*te, *te_new;
+ 		Aggref	*expr_new = NULL;
+ 
+ 		te = castNode(TargetEntry, lfirst(l));
+ 
+ 		if (IsA(te->expr, GroupedVar))
+ 		{
+ 			GroupedVar	*gvar;
+ 
+ 			gvar = castNode(GroupedVar, te->expr);
+ 			expr_new = gvar->agg_partial;
+ 		}
+ 
+ 		if (expr_new != NULL)
+ 		{
+ 			te_new = flatCopyTargetEntry(te);
+ 			te_new->expr = (Expr *) expr_new;
+ 		}
+ 		else
+ 			te_new = te;
+ 		result = lappend(result, te_new);
+ 	}
+ 
+ 	return result;
+ }
+ 
+ /*
+  * For each aggregate add GroupedVar to target if "vars" is true, or the
+  * Aggref (marked as partial) if "vars" is false.
+  *
+  * If caller passes the aggregates, he must do so in the form of
+  * GroupedVarInfos so that we don't have to look for gvid. If NULL is passed,
+  * the function retrieves the suitable aggregates itself.
+  *
+  * List of the aggregates added is returned. This is only useful if the
+  * function had to retrieve the aggregates itself (i.e. NIL was passed for
+  * aggregates) -- caller is expected to do extra checks in that case (and to
+  * also free the list).
+  */
+ List *
+ add_aggregates_to_target(PlannerInfo *root, PathTarget *target,
+ 						 List *aggregates, RelOptInfo *rel)
+ {
+ 	ListCell	*lc;
+ 	GroupedVarInfo	*gvi;
+ 
+ 	if (aggregates == NIL)
+ 	{
+ 		/* Caller should pass the aggregates for base relation. */
+ 		Assert(rel->reloptkind != RELOPT_BASEREL);
+ 
+ 		/* Collect all aggregates that this rel can evaluate. */
+ 		foreach(lc, root->grouped_var_list)
+ 		{
+ 			gvi = castNode(GroupedVarInfo, lfirst(lc));
+ 
+ 			/*
+ 			 * Overlap is not guarantee of correctness alone, but caller needs
+ 			 * to do additional checks, so we're optimistic here.
+ 			 *
+ 			 * If gv_eval_at is NULL, the underlying Aggref should have
+ 			 * aggstar set.
+ 			 */
+ 			if (bms_overlap(gvi->gv_eval_at, rel->relids) ||
+ 				gvi->gv_eval_at == NULL)
+ 				aggregates = lappend(aggregates, gvi);
+ 		}
+ 
+ 		if (aggregates == NIL)
+ 			return NIL;
+ 	}
+ 
+ 	/* Create the vars and add them to the target. */
+ 	foreach(lc, aggregates)
+ 	{
+ 		GroupedVar	*gvar;
+ 
+ 		gvi = castNode(GroupedVarInfo, lfirst(lc));
+ 		gvar = makeNode(GroupedVar);
+ 		gvar->gvid = gvi->gvid;
+ 		gvar->gvexpr = gvi->gvexpr;
+ 		gvar->agg_partial = gvi->agg_partial;
+ 		add_new_column_to_pathtarget(target, (Expr *) gvar);
+ 	}
+ 
+ 	return aggregates;
+ }
+ 
+ /*
+  * Return ressortgroupref of the target entry that is either equal to the
+  * expression or exists in the same equivalence class.
+  */
+ Index
+ get_expr_sortgroupref(PlannerInfo *root, Expr *expr)
+ {
+ 	ListCell	*lc;
+ 	Index		sortgroupref;
+ 
+ 	/*
+ 	 * First, check if the query group clause contains exactly this
+ 	 * expression.
+ 	 */
+ 	foreach(lc, root->processed_tlist)
+ 	{
+ 		TargetEntry		*te = castNode(TargetEntry, lfirst(lc));
+ 
+ 		if (equal(expr, te->expr) && te->ressortgroupref > 0)
+ 			return te->ressortgroupref;
+ 	}
+ 
+ 	/*
+ 	 * If exactly this expression is not there, check if a grouping clause
+ 	 * exists that belongs to the same equivalence class as the expression.
+ 	 */
+ 	foreach(lc, root->group_pathkeys)
+ 	{
+ 		PathKey	*pk = castNode(PathKey, lfirst(lc));
+ 		EquivalenceClass		*ec = pk->pk_eclass;
+ 		ListCell		*lm;
+ 		EquivalenceMember		*em;
+ 		Expr	*em_expr = NULL;
+ 		Query	*query = root->parse;
+ 
+ 		/*
+ 		 * Single-member EC cannot provide us with additional expression.
+ 		 */
+ 		if (list_length(ec->ec_members) < 2)
+ 			continue;
+ 
+ 		/* We need equality anywhere in the join tree. */
+ 		if (ec->ec_below_outer_join)
+ 			continue;
+ 
+ 		/*
+ 		 * TODO Reconsider this restriction. As the grouping expression is
+ 		 * only evaluated at the relation level (and only the result will be
+ 		 * propagated to the final targetlist), volatile function might be
+ 		 * o.k. Need to think what volatile EC exactly means.
+ 		 */
+ 		if (ec->ec_has_volatile)
+ 			continue;
+ 
+ 		foreach(lm, ec->ec_members)
+ 		{
+ 			em = (EquivalenceMember *) lfirst(lm);
+ 
+ 			/* The EC has !ec_below_outer_join. */
+ 			Assert(!em->em_nullable_relids);
+ 			if (equal(em->em_expr, expr))
+ 			{
+ 				em_expr = (Expr *) em->em_expr;
+ 				break;
+ 			}
+ 		}
+ 
+ 		if (em_expr == NULL)
+ 			/* Go for the next EC. */
+ 			continue;
+ 
+ 		/*
+ 		 * Find the corresponding SortGroupClause, which provides us with
+ 		 * sortgroupref. (It can belong to any EC member.)
+ 		 */
+ 		sortgroupref = 0;
+ 		foreach(lm, ec->ec_members)
+ 		{
+ 			ListCell	*lsg;
+ 
+ 			em = (EquivalenceMember *) lfirst(lm);
+ 			foreach(lsg, query->groupClause)
+ 			{
+ 				SortGroupClause	*sgc;
+ 				Expr	*expr;
+ 
+ 				sgc = (SortGroupClause *) lfirst(lsg);
+ 				expr = (Expr *) get_sortgroupclause_expr(sgc,
+ 														 query->targetList);
+ 				if (equal(em->em_expr, expr))
+ 				{
+ 					Assert(sgc->tleSortGroupRef > 0);
+ 					sortgroupref = sgc->tleSortGroupRef;
+ 					break;
+ 				}
+ 			}
+ 
+ 			if (sortgroupref > 0)
+ 				break;
+ 		}
+ 
+ 		/*
+ 		 * Since we searched in group_pathkeys, at least one EM of this EC
+ 		 * should correspond to a SortGroupClause, otherwise the EC could
+ 		 * not exist at all.
+ 		 */
+ 		Assert(sortgroupref > 0);
+ 
+ 		return sortgroupref;
+ 	}
+ 
+ 	/* No EC found in group_pathkeys. */
+ 	return 0;
+ }
+ 
+ /*
   * split_pathtarget_at_srfs
   *		Split given PathTarget into multiple levels to position SRFs safely
   *
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
new file mode 100644
index 184e5da..5e3c3b4
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*************** get_rule_expr(Node *node, deparse_contex
*** 7559,7564 ****
--- 7559,7572 ----
  			get_agg_expr((Aggref *) node, context, (Aggref *) node);
  			break;
  
+ 		case T_GroupedVar:
+ 		{
+ 			GroupedVar *gvar = castNode(GroupedVar, node);
+ 
+ 			get_agg_expr(gvar->agg_partial, context, (Aggref *) gvar->gvexpr);
+ 			break;
+ 		}
+ 
  		case T_GroupingFunc:
  			{
  				GroupingFunc *gexpr = (GroupingFunc *) node;
*************** get_agg_combine_expr(Node *node, deparse
*** 8993,9002 ****
  	Aggref	   *aggref;
  	Aggref	   *original_aggref = private;
  
! 	if (!IsA(node, Aggref))
  		elog(ERROR, "combining Aggref does not point to an Aggref");
  
- 	aggref = (Aggref *) node;
  	get_agg_expr(aggref, context, original_aggref);
  }
  
--- 9001,9018 ----
  	Aggref	   *aggref;
  	Aggref	   *original_aggref = private;
  
! 	if (IsA(node, Aggref))
! 		aggref = (Aggref *) node;
! 	else if (IsA(node, GroupedVar))
! 	{
! 		GroupedVar *gvar = castNode(GroupedVar, node);
! 
! 		aggref = gvar->agg_partial;
! 		original_aggref = castNode(Aggref, gvar->gvexpr);
! 	}
! 	else
  		elog(ERROR, "combining Aggref does not point to an Aggref");
  
  	get_agg_expr(aggref, context, original_aggref);
  }
  
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
new file mode 100644
index a35b93b..78e24ea
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
***************
*** 114,119 ****
--- 114,120 ----
  #include "catalog/pg_statistic_ext.h"
  #include "catalog/pg_type.h"
  #include "executor/executor.h"
+ #include "executor/nodeAgg.h"
  #include "mb/pg_wchar.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
*************** estimate_hash_bucketsize(PlannerInfo *ro
*** 3705,3710 ****
--- 3706,3744 ----
  	return (Selectivity) estfract;
  }
  
+ /*
+  * estimate_hashagg_tablesize
+  *	  estimate the number of bytes that a hash aggregate hashtable will
+  *	  require based on the agg_costs, path width and dNumGroups.
+  *
+  * XXX this may be over-estimating the size now that hashagg knows to omit
+  * unneeded columns from the hashtable. Also for mixed-mode grouping sets,
+  * grouping columns not in the hashed set are counted here even though hashagg
+  * won't store them. Is this a problem?
+  */
+ Size
+ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
+ 						   double dNumGroups)
+ {
+ 	Size		hashentrysize;
+ 
+ 	/* Estimate per-hash-entry space at tuple width... */
+ 	hashentrysize = MAXALIGN(path->pathtarget->width) +
+ 		MAXALIGN(SizeofMinimalTupleHeader);
+ 
+ 	/* plus space for pass-by-ref transition values... */
+ 	hashentrysize += agg_costs->transitionSpace;
+ 	/* plus the per-hash-entry overhead */
+ 	hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
+ 
+ 	/*
+ 	 * Note that this disregards the effect of fill-factor and growth policy
+ 	 * of the hash-table. That's probably ok, given default the default
+ 	 * fill-factor is relatively high. It'd be hard to meaningfully factor in
+ 	 * "double-in-size" growth policies here.
+ 	 */
+ 	return hashentrysize * dNumGroups;
+ }
  
  /*-------------------------------------------------------------------------
   *
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
new file mode 100644
index 85c6b61..cf94ccc
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
*************** equalPartitionDescs(PartitionKey key, Pa
*** 1204,1210 ****
  			if (partdesc2->boundinfo == NULL)
  				return false;
  
! 			if (!partition_bounds_equal(key, partdesc1->boundinfo,
  										partdesc2->boundinfo))
  				return false;
  		}
--- 1204,1212 ----
  			if (partdesc2->boundinfo == NULL)
  				return false;
  
! 			if (!partition_bounds_equal(key->partnatts, key->parttyplen,
! 										key->parttypbyval,
! 										partdesc1->boundinfo,
  										partdesc2->boundinfo))
  				return false;
  		}
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
new file mode 100644
index a414fb2..343986d
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** static struct config_bool ConfigureNames
*** 914,919 ****
--- 914,928 ----
  		true,
  		NULL, NULL, NULL
  	},
+ 	{
+ 		{"enable_partition_wise_join", PGC_USERSET, QUERY_TUNING_METHOD,
+ 			gettext_noop("Enables partition-wise join."),
+ 			NULL
+ 		},
+ 		&enable_partition_wise_join,
+ 		false,
+ 		NULL, NULL, NULL
+ 	},
  
  	{
  		{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
diff --git a/src/include/catalog/partition.h b/src/include/catalog/partition.h
new file mode 100644
index 421644c..e51bca1
*** a/src/include/catalog/partition.h
--- b/src/include/catalog/partition.h
*************** typedef struct PartitionDispatchData
*** 71,78 ****
  typedef struct PartitionDispatchData *PartitionDispatch;
  
  extern void RelationBuildPartitionDesc(Relation relation);
! extern bool partition_bounds_equal(PartitionKey key,
! 					   PartitionBoundInfo p1, PartitionBoundInfo p2);
  
  extern void check_new_partition_bound(char *relname, Relation parent, Node *bound);
  extern Oid	get_partition_parent(Oid relid);
--- 71,79 ----
  typedef struct PartitionDispatchData *PartitionDispatch;
  
  extern void RelationBuildPartitionDesc(Relation relation);
! extern bool partition_bounds_equal(int partnatts, int16 *parttyplen,
! 					   bool *parttypbyval, PartitionBoundInfo b1,
! 					   PartitionBoundInfo b2);
  
  extern void check_new_partition_bound(char *relname, Relation parent, Node *bound);
  extern Oid	get_partition_parent(Oid relid);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
new file mode 100644
index 6ca44f7..c57ff7b
*** a/src/include/foreign/fdwapi.h
--- b/src/include/foreign/fdwapi.h
*************** typedef void (*ShutdownForeignScan_funct
*** 155,160 ****
--- 155,163 ----
  typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
  															 RelOptInfo *rel,
  														 RangeTblEntry *rte);
+ typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
+ 															List *fdw_private,
+ 													   RelOptInfo *child_rel);
  
  /*
   * FdwRoutine is the struct returned by a foreign-data wrapper's handler
*************** typedef struct FdwRoutine
*** 226,231 ****
--- 229,237 ----
  	InitializeDSMForeignScan_function InitializeDSMForeignScan;
  	InitializeWorkerForeignScan_function InitializeWorkerForeignScan;
  	ShutdownForeignScan_function ShutdownForeignScan;
+ 
+ 	/* Support functions for path reparameterization. */
+ 	ReparameterizeForeignPathByChild_function	ReparameterizeForeignPathByChild;
  } FdwRoutine;
  
  
diff --git a/src/include/nodes/extensible.h b/src/include/nodes/extensible.h
new file mode 100644
index 0b02cc1..1c802ad
*** a/src/include/nodes/extensible.h
--- b/src/include/nodes/extensible.h
*************** typedef struct CustomPathMethods
*** 96,101 ****
--- 96,104 ----
  												List *tlist,
  												List *clauses,
  												List *custom_plans);
+ 	struct List *(*ReparameterizeCustomPathByChild) (PlannerInfo *root,
+ 													 List *custom_private,
+ 													 RelOptInfo *child_rel);
  }	CustomPathMethods;
  
  /*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
new file mode 100644
index f59d719..ba1eac8
*** a/src/include/nodes/nodes.h
--- b/src/include/nodes/nodes.h
*************** typedef enum NodeTag
*** 218,223 ****
--- 218,224 ----
  	T_IndexOptInfo,
  	T_ForeignKeyOptInfo,
  	T_ParamPathInfo,
+ 	T_GroupedPathInfo,
  	T_Path,
  	T_IndexPath,
  	T_BitmapHeapPath,
*************** typedef enum NodeTag
*** 258,267 ****
--- 259,270 ----
  	T_PathTarget,
  	T_RestrictInfo,
  	T_PlaceHolderVar,
+ 	T_GroupedVar,
  	T_SpecialJoinInfo,
  	T_AppendRelInfo,
  	T_PartitionedChildRelInfo,
  	T_PlaceHolderInfo,
+ 	T_GroupedVarInfo,
  	T_MinMaxAggInfo,
  	T_PlannerParamItem,
  	T_RollupData,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
new file mode 100644
index 7a8e2fd..b576dd5
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***************
*** 15,20 ****
--- 15,21 ----
  #define RELATION_H
  
  #include "access/sdir.h"
+ #include "catalog/partition.h"
  #include "lib/stringinfo.h"
  #include "nodes/params.h"
  #include "nodes/parsenodes.h"
*************** typedef struct PlannerInfo
*** 256,261 ****
--- 257,264 ----
  
  	List	   *placeholder_list;		/* list of PlaceHolderInfos */
  
+ 	List		*grouped_var_list; /* List of GroupedVarInfos. */
+ 
  	List	   *fkey_list;		/* list of ForeignKeyOptInfos */
  
  	List	   *query_pathkeys; /* desired pathkeys for query_planner() */
*************** typedef struct PlannerInfo
*** 265,270 ****
--- 268,276 ----
  	List	   *distinct_pathkeys;		/* distinctClause pathkeys, if any */
  	List	   *sort_pathkeys;	/* sortClause pathkeys, if any */
  
+ 	List	   *part_schemes;	/* Canonicalised partition schemes
+ 								 * used in the query. */
+ 
  	List	   *initial_rels;	/* RelOptInfos we are now trying to join */
  
  	/* Use fetch_upper_rel() to get any particular upper rel */
*************** typedef struct PlannerInfo
*** 325,330 ****
--- 331,362 ----
  	((root)->simple_rte_array ? (root)->simple_rte_array[rti] : \
  	 rt_fetch(rti, (root)->parse->rtable))
  
+ /*
+  * Partitioning scheme
+  *		Structure to hold partitioning scheme for a given relation.
+  *
+  * Multiple relations may be partitioned in the same way. The relations
+  * resulting from joining such relations may be partitioned in the same way as
+  * the joining relations. Similarly, relations derived from such relations by
+  * grouping, sorting may be partitioned in the same way as the underlying
+  * scan relations. All such relations partitioned in the same way share the
+  * partitioning scheme.
+  *
+  * PlannerInfo stores a list of distinct "canonical" partitioning schemes.
+  * RelOptInfo of a partitioned relation holds the pointer to "canonical"
+  * partitioning scheme.
+  */
+ typedef struct PartitionSchemeData
+ {
+ 	char		strategy;		/* partition strategy */
+ 	int16		partnatts;		/* number of partition attributes */
+ 	Oid		   *partopfamily;	/* OIDs of operator families */
+ 	Oid		   *partopcintype;	/* OIDs of opclass declared input data types */
+ 	FmgrInfo   *partsupfunc;	/* lookup info for support funcs */
+ 	Oid		   *parttypcoll;	/* OIDs of collations of partition keys. */
+ } PartitionSchemeData;
+ 
+ typedef struct PartitionSchemeData *PartitionScheme;
  
  /*----------
   * RelOptInfo
*************** typedef struct PlannerInfo
*** 359,364 ****
--- 391,401 ----
   * handling join alias Vars.  Currently this is not needed because all join
   * alias Vars are expanded to non-aliased form during preprocess_expression.
   *
+  * We also have relations representing joins between child relations of
+  * different partitioned tables. These relations are not added to
+  * join_rel_level lists as they are not joined directly by the dynamic
+  * programming algorithm.
+  *
   * There is also a RelOptKind for "upper" relations, which are RelOptInfos
   * that describe post-scan/join processing steps, such as aggregation.
   * Many of the fields in these RelOptInfos are meaningless, but their Path
*************** typedef struct PlannerInfo
*** 401,406 ****
--- 438,445 ----
   *		direct_lateral_relids - rels this rel has direct LATERAL references to
   *		lateral_relids - required outer rels for LATERAL, as a Relids set
   *			(includes both direct and indirect lateral references)
+  *		gpi - GroupedPathInfo if the relation can produce grouped paths, NULL
+  *		otherwise.
   *
   * If the relation is a base relation it will have these fields set:
   *
*************** typedef struct PlannerInfo
*** 486,491 ****
--- 525,543 ----
   * We store baserestrictcost in the RelOptInfo (for base relations) because
   * we know we will need it at least once (to price the sequential scan)
   * and may need it multiple times to price index scans.
+  *
+  * If the relation is partitioned these fields will be set
+  * 		part_scheme - Partitioning scheme of the relation
+  * 		nparts	- Number of partitions
+  * 		boundinfo	- Partition bounds/lists
+  * 		part_rels	- RelOptInfos of the partition relations
+  * 		partexprs	- Partition key expressions
+  *
+  * Note: A base relation will always have only one set of partition keys. But a
+  * join relation is partitioned by the partition keys of joining relations.
+  * Partition keys are stored as an array of partition key expressions, with
+  * each array element containing a list of one (for a base relation) or more
+  * (as many as the number of joining relations) expressions.
   *----------
   */
  typedef enum RelOptKind
*************** typedef enum RelOptKind
*** 493,498 ****
--- 545,551 ----
  	RELOPT_BASEREL,
  	RELOPT_JOINREL,
  	RELOPT_OTHER_MEMBER_REL,
+ 	RELOPT_OTHER_JOINREL,
  	RELOPT_UPPER_REL,
  	RELOPT_DEADREL
  } RelOptKind;
*************** typedef enum RelOptKind
*** 506,518 ****
  	 (rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
  
  /* Is the given relation a join relation? */
! #define IS_JOIN_REL(rel) ((rel)->reloptkind == RELOPT_JOINREL)
  
  /* Is the given relation an upper relation? */
  #define IS_UPPER_REL(rel) ((rel)->reloptkind == RELOPT_UPPER_REL)
  
  /* Is the given relation an "other" relation? */
! #define IS_OTHER_REL(rel) ((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
  
  typedef struct RelOptInfo
  {
--- 559,575 ----
  	 (rel)->reloptkind == RELOPT_OTHER_MEMBER_REL)
  
  /* Is the given relation a join relation? */
! #define IS_JOIN_REL(rel)	\
! 	((rel)->reloptkind == RELOPT_JOINREL || \
! 	 (rel)->reloptkind == RELOPT_OTHER_JOINREL)
  
  /* Is the given relation an upper relation? */
  #define IS_UPPER_REL(rel) ((rel)->reloptkind == RELOPT_UPPER_REL)
  
  /* Is the given relation an "other" relation? */
! #define IS_OTHER_REL(rel) \
! 	((rel)->reloptkind == RELOPT_OTHER_MEMBER_REL || \
! 	 (rel)->reloptkind == RELOPT_OTHER_JOINREL)
  
  typedef struct RelOptInfo
  {
*************** typedef struct RelOptInfo
*** 548,553 ****
--- 605,613 ----
  	Relids		direct_lateral_relids;	/* rels directly laterally referenced */
  	Relids		lateral_relids; /* minimum parameterization of rel */
  
+ 	/* Information needed to produce grouped paths. */
+ 	struct GroupedPathInfo	*gpi;
+ 
  	/* information about a base rel (not set for join rels!) */
  	Index		relid;
  	Oid			reltablespace;	/* containing tablespace */
*************** typedef struct RelOptInfo
*** 566,571 ****
--- 626,632 ----
  	PlannerInfo *subroot;		/* if subquery */
  	List	   *subplan_params; /* if subquery */
  	int			rel_parallel_workers;	/* wanted number of parallel workers */
+ 	Oid		   *part_oids;		/* OIDs of partitions */
  
  	/* Information about foreign tables and foreign joins */
  	Oid			serverid;		/* identifies server for the table or join */
*************** typedef struct RelOptInfo
*** 591,596 ****
--- 652,673 ----
  
  	/* used by "other" relations */
  	Relids		top_parent_relids;		/* Relids of topmost parents */
+ 
+ 	/* For all the partitioned relations. */
+ 	PartitionScheme part_scheme;	/* Partitioning scheme. */
+ 	int			nparts;			/* number of partitions */
+ 	PartitionBoundInfo boundinfo;	/* Partition bounds/lists */
+ 	struct RelOptInfo **part_rels;		/* Array of RelOptInfos of partitions,
+ 										 * stored in the same order as bounds
+ 										 * or lists in PartitionScheme.
+ 										 */
+ 	List	  **partexprs;				/* Array of list of partition key
+ 										 * expressions. For base relations
+ 										 * these are one element lists. For
+ 										 * join there may be as many elements
+ 										 * as the number of joining
+ 										 * relations.
+ 										 */
  } RelOptInfo;
  
  /*
*************** typedef struct ParamPathInfo
*** 913,918 ****
--- 990,1017 ----
  	List	   *ppi_clauses;	/* join clauses available from outer rels */
  } ParamPathInfo;
  
+ /*
+  * GroupedPathInfo
+  *
+  * If RelOptInfo points to this structure, grouped paths can be created for
+  * it.
+  *
+  * "target" will be used as pathtarget of grouped paths produced by this
+  * relation. Grouped path is either a result of aggregation of the relation
+  * that owns this structure or, if the owning relation is a join, a join path
+  * whose one side is a grouped path and the other is a plain (i.e. not
+  * grouped) one. (Two grouped paths cannot be joined in general because
+  * grouping of one side of the join essentially reduces occurrence of groups
+  * of the other side in the input of the final aggregation.)
+  */
+ typedef struct GroupedPathInfo
+ {
+ 	NodeTag		type;
+ 
+ 	PathTarget	*target;		/* output of grouped paths. */
+ 	List	*pathlist;			/* List of grouped paths. */
+ 	List	*partial_pathlist;	/* List of partial grouped paths. */
+ } GroupedPathInfo;
  
  /*
   * Type "Path" is used as-is for sequential-scan paths, as well as some other
*************** typedef struct PlaceHolderVar
*** 1852,1857 ****
--- 1951,1989 ----
  	Index		phlevelsup;		/* > 0 if PHV belongs to outer query */
  } PlaceHolderVar;
  
+ 
+ /*
+  * Similar to the concept of PlaceHolderVar, we treat aggregates and grouping
+  * columns as special variables if grouping is possible below the top-level
+  * join. The reason is that aggregates having start as the argument can be
+  * evaluated at various places in the join tree (i.e. cannot be assigned to
+  * target list of exactly one relation). Also this concept seems to be less
+  * invasive than adding the grouped vars to reltarget (in which case
+  * attr_needed and attr_widths arrays of RelOptInfo) would also need
+  * additional changes.
+  *
+  * gvexpr is a pointer to gvexpr field of the corresponding instance
+  * GroupedVarInfo. It's there for the sake of exprType(), exprCollation(),
+  * etc.
+  *
+  * agg_partial also points to the corresponding field of GroupedVarInfo if the
+  * GroupedVar is in the target of a parent relation (RELOPT_BASEREL). However
+  * within a child relation's (RELOPT_OTHER_MEMBER_REL) target it points to a
+  * copy which has argument expressions translated, so they no longer reference
+  * the parent.
+  *
+  * XXX Currently we only create GroupedVar for aggregates, but sometime we can
+  * do it for grouping keys as well. That would allow grouping below the
+  * top-level join by keys other than plain Var.
+  */
+ typedef struct GroupedVar
+ {
+ 	Expr		xpr;
+ 	Expr		*gvexpr;		/* the represented expression */
+ 	Aggref		*agg_partial;	/* partial aggregate if gvexpr is aggregate */
+ 	Index		gvid;		/* GroupedVarInfo */
+ } GroupedVar;
+ 
  /*
   * "Special join" info.
   *
*************** typedef struct PlaceHolderInfo
*** 2067,2072 ****
--- 2199,2220 ----
  } PlaceHolderInfo;
  
  /*
+  * Likewise, GroupedVarInfo exists for each distinct GroupedVar.
+  */
+ typedef struct GroupedVarInfo
+ {
+ 	NodeTag		type;
+ 
+ 	Index		gvid;			/* GroupedVar.gvid */
+ 	Expr		*gvexpr;		/* the represented expression. */
+ 	Aggref		*agg_partial;	/* if gvexpr is aggregate, agg_partial is
+ 								 * the corresponding partial aggregate */
+ 	Relids		gv_eval_at;		/* lowest level we can evaluate the expression
+ 								 * at or NULL if it can happen anywhere. */
+ 	int32		gv_width;		/* estimated width of the expression */
+ } GroupedVarInfo;
+ 
+ /*
   * This struct describes one potentially index-optimizable MIN/MAX aggregate
   * function.  MinMaxAggPath contains a list of these, and if we accept that
   * path, the list is stored into root->minmax_aggs for use during setrefs.c.
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
new file mode 100644
index ed70def..ca06455
*** a/src/include/optimizer/cost.h
--- b/src/include/optimizer/cost.h
*************** extern bool enable_material;
*** 67,72 ****
--- 67,73 ----
  extern bool enable_mergejoin;
  extern bool enable_hashjoin;
  extern bool enable_gathermerge;
+ extern bool enable_partition_wise_join;
  extern int	constraint_exclusion;
  
  extern double clamp_row_est(double nrows);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
new file mode 100644
index 77bc770..4a0d845
*** a/src/include/optimizer/pathnode.h
--- b/src/include/optimizer/pathnode.h
*************** extern int compare_path_costs(Path *path
*** 25,37 ****
  extern int compare_fractional_path_costs(Path *path1, Path *path2,
  							  double fraction);
  extern void set_cheapest(RelOptInfo *parent_rel);
! extern void add_path(RelOptInfo *parent_rel, Path *new_path);
  extern bool add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 				  List *pathkeys, Relids required_outer);
! extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path);
  extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
! 						  Cost total_cost, List *pathkeys);
  
  extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
  					Relids required_outer, int parallel_workers);
--- 25,39 ----
  extern int compare_fractional_path_costs(Path *path1, Path *path2,
  							  double fraction);
  extern void set_cheapest(RelOptInfo *parent_rel);
! extern void add_path(RelOptInfo *parent_rel, Path *new_path, bool grouped);
  extern bool add_path_precheck(RelOptInfo *parent_rel,
  				  Cost startup_cost, Cost total_cost,
! 							  List *pathkeys, Relids required_outer, bool grouped);
! extern void add_partial_path(RelOptInfo *parent_rel, Path *new_path,
! 							 bool grouped);
  extern bool add_partial_path_precheck(RelOptInfo *parent_rel,
! 									  Cost total_cost, List *pathkeys,
! 									  bool grouped);
  
  extern Path *create_seqscan_path(PlannerInfo *root, RelOptInfo *rel,
  					Relids required_outer, int parallel_workers);
*************** extern ForeignPath *create_foreignscan_p
*** 112,118 ****
  						Path *fdw_outerpath,
  						List *fdw_private);
  
! extern Relids calc_nestloop_required_outer(Path *outer_path, Path *inner_path);
  extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
  
  extern NestPath *create_nestloop_path(PlannerInfo *root,
--- 114,123 ----
  						Path *fdw_outerpath,
  						List *fdw_private);
  
! extern Relids calc_nestloop_required_outer(Relids outerrelids,
! 							 Relids outer_paramrels,
! 							 Relids innerrelids,
! 							 Relids inner_paramrels);
  extern Relids calc_non_nestloop_required_outer(Path *outer_path, Path *inner_path);
  
  extern NestPath *create_nestloop_path(PlannerInfo *root,
*************** extern NestPath *create_nestloop_path(Pl
*** 124,130 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer);
  
  extern MergePath *create_mergejoin_path(PlannerInfo *root,
  					  RelOptInfo *joinrel,
--- 129,136 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 List *pathkeys,
! 					 Relids required_outer,
! 					 PathTarget *target);
  
  extern MergePath *create_mergejoin_path(PlannerInfo *root,
  					  RelOptInfo *joinrel,
*************** extern MergePath *create_mergejoin_path(
*** 138,144 ****
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys);
  
  extern HashPath *create_hashjoin_path(PlannerInfo *root,
  					 RelOptInfo *joinrel,
--- 144,151 ----
  					  Relids required_outer,
  					  List *mergeclauses,
  					  List *outersortkeys,
! 					  List *innersortkeys,
! 					  PathTarget *target);
  
  extern HashPath *create_hashjoin_path(PlannerInfo *root,
  					 RelOptInfo *joinrel,
*************** extern HashPath *create_hashjoin_path(Pl
*** 149,155 ****
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses);
  
  extern ProjectionPath *create_projection_path(PlannerInfo *root,
  					   RelOptInfo *rel,
--- 156,163 ----
  					 Path *inner_path,
  					 List *restrict_clauses,
  					 Relids required_outer,
! 					 List *hashclauses,
! 					 PathTarget *target);
  
  extern ProjectionPath *create_projection_path(PlannerInfo *root,
  					   RelOptInfo *rel,
*************** extern AggPath *create_agg_path(PlannerI
*** 190,195 ****
--- 198,217 ----
  				List *qual,
  				const AggClauseCosts *aggcosts,
  				double numGroups);
+ extern AggPath *create_partial_agg_sorted_path(PlannerInfo *root,
+ 											   Path *subpath,
+ 											   bool first_call,
+ 											   List **group_clauses,
+ 											   List **group_exprs,
+ 											   List **agg_exprs,
+ 											   double input_rows);
+ extern AggPath *create_partial_agg_hashed_path(PlannerInfo *root,
+ 											   Path *subpath,
+ 											   bool first_call,
+ 											   List **group_clauses,
+ 											   List **group_exprs,
+ 											   List **agg_exprs,
+ 											   double input_rows);
  extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
  						 RelOptInfo *rel,
  						 Path *subpath,
*************** extern LimitPath *create_limit_path(Plan
*** 248,253 ****
--- 270,277 ----
  extern Path *reparameterize_path(PlannerInfo *root, Path *path,
  					Relids required_outer,
  					double loop_count);
+ extern Path *reparameterize_path_by_child(PlannerInfo *root, Path *path,
+ 					RelOptInfo *child_rel);
  
  /*
   * prototypes for relnode.c
*************** extern ParamPathInfo *get_joinrel_paramp
*** 285,289 ****
--- 309,320 ----
  						  List **restrict_clauses);
  extern ParamPathInfo *get_appendrel_parampathinfo(RelOptInfo *appendrel,
  							Relids required_outer);
+ extern ParamPathInfo *find_param_path_info(RelOptInfo *rel,
+ 									  Relids required_outer);
+ extern void prepare_rel_for_grouping(PlannerInfo *root, RelOptInfo *rel);
+ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
+ 					 RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+ 					 RelOptInfo *parent_joinrel, List *restrictlist,
+ 					 SpecialJoinInfo *sjinfo, JoinType jointype);
  
  #endif   /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
new file mode 100644
index 25fe78c..8dd4efd
*** a/src/include/optimizer/paths.h
--- b/src/include/optimizer/paths.h
*************** extern void set_dummy_rel_pathlist(RelOp
*** 53,63 ****
  extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
  					 List *initial_rels);
  
! extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
  extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
  						double index_pages);
  extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
  										Path *bitmapqual);
  
  #ifdef OPTIMIZER_DEBUG
  extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
--- 53,69 ----
  extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
  					 List *initial_rels);
  
! extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
! 								  bool grouped);
! extern void create_grouped_path(PlannerInfo *root, RelOptInfo *rel,
! 								Path *subpath, bool precheck, bool partial,
! 								AggStrategy aggstrategy);
  extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
  						double index_pages);
  extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
  										Path *bitmapqual);
+ extern void generate_partition_wise_join_paths(PlannerInfo *root,
+ 											   RelOptInfo *rel);
  
  #ifdef OPTIMIZER_DEBUG
  extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
*************** extern void debug_print_rel(PlannerInfo
*** 67,73 ****
   * indxpath.c
   *	  routines to generate index paths
   */
! extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
  extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  							  List *restrictlist,
  							  List *exprlist, List *oprlist);
--- 73,80 ----
   * indxpath.c
   *	  routines to generate index paths
   */
! extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel,
! 							   bool grouped);
  extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  							  List *restrictlist,
  							  List *exprlist, List *oprlist);
*************** extern bool have_join_order_restriction(
*** 111,116 ****
--- 118,126 ----
  							RelOptInfo *rel1, RelOptInfo *rel2);
  extern bool have_dangerous_phv(PlannerInfo *root,
  				   Relids outer_relids, Relids inner_params);
+ extern void mark_dummy_rel(RelOptInfo *rel);
+ extern bool have_partkey_equi_join(RelOptInfo *rel1, RelOptInfo *rel2,
+ 					   JoinType jointype, List *restrictlist, bool *is_strict);
  
  /*
   * equivclass.c
diff --git a/src/include/optimizer/placeholder.h b/src/include/optimizer/placeholder.h
new file mode 100644
index 11e6403..8598268
*** a/src/include/optimizer/placeholder.h
--- b/src/include/optimizer/placeholder.h
*************** extern void fix_placeholder_input_needed
*** 28,32 ****
--- 28,34 ----
  extern void add_placeholders_to_base_rels(PlannerInfo *root);
  extern void add_placeholders_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
  							RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+ extern void add_placeholders_to_child_joinrel(PlannerInfo *root,
+ 							RelOptInfo *childrel, RelOptInfo *parentrel);
  
  #endif   /* PLACEHOLDER_H */
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
new file mode 100644
index 5df68a2..07bc4c0
*** a/src/include/optimizer/planmain.h
--- b/src/include/optimizer/planmain.h
*************** extern int	join_collapse_limit;
*** 74,80 ****
  extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode);
  extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
  extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
! 					   Relids where_needed, bool create_new_ph);
  extern void find_lateral_references(PlannerInfo *root);
  extern void create_lateral_join_info(PlannerInfo *root);
  extern List *deconstruct_jointree(PlannerInfo *root);
--- 74,82 ----
  extern void add_base_rels_to_query(PlannerInfo *root, Node *jtnode);
  extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
  extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
! 								   Relids where_needed, bool create_new_ph);
! extern void add_grouping_info_to_base_rels(PlannerInfo *root);
! extern void add_grouped_vars_to_rels(PlannerInfo *root);
  extern void find_lateral_references(PlannerInfo *root);
  extern void create_lateral_join_info(PlannerInfo *root);
  extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
new file mode 100644
index f3aaa23..4a550bb
*** a/src/include/optimizer/planner.h
--- b/src/include/optimizer/planner.h
*************** extern Expr *preprocess_phv_expression(P
*** 58,62 ****
--- 58,64 ----
  extern bool plan_cluster_use_sort(Oid tableOid, Oid indexOid);
  
  extern List *get_partitioned_child_rels(PlannerInfo *root, Index rti);
+ extern List *get_partitioned_child_rels_for_join(PlannerInfo *root,
+ 									RelOptInfo *joinrel);
  
  #endif   /* PLANNER_H */
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
new file mode 100644
index 2b20b36..95802c9
*** a/src/include/optimizer/prep.h
--- b/src/include/optimizer/prep.h
*************** extern RelOptInfo *plan_set_operations(P
*** 53,61 ****
  extern void expand_inherited_tables(PlannerInfo *root);
  
  extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
! 					   AppendRelInfo *appinfo);
  
  extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  RelOptInfo *child_rel);
  
  #endif   /* PREP_H */
--- 53,74 ----
  extern void expand_inherited_tables(PlannerInfo *root);
  
  extern Node *adjust_appendrel_attrs(PlannerInfo *root, Node *node,
! 					   int nappinfos, AppendRelInfo **appinfos);
  
  extern Node *adjust_appendrel_attrs_multilevel(PlannerInfo *root, Node *node,
! 								  Relids child_relids,
! 								  Relids top_parent_relids);
! 
! extern Relids adjust_child_relids(Relids relids, int nappinfos,
! 					AppendRelInfo **appinfos);
! 
! extern AppendRelInfo **find_appinfos_by_relids(PlannerInfo *root,
! 						Relids relids, int *nappinfos);
! 
! extern SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
! 									SpecialJoinInfo *parent_sjinfo,
! 									Relids left_relids, Relids right_relids);
! extern Relids adjust_child_relids_multilevel(PlannerInfo *root, Relids relids,
! 							   Relids child_relids, Relids top_parent_relids);
  
  #endif   /* PREP_H */
diff --git a/src/include/optimizer/tlist.h b/src/include/optimizer/tlist.h
new file mode 100644
index ccb93d8..ddea03c
*** a/src/include/optimizer/tlist.h
--- b/src/include/optimizer/tlist.h
*************** extern Node *get_sortgroupclause_expr(So
*** 41,46 ****
--- 41,49 ----
  						 List *targetList);
  extern List *get_sortgrouplist_exprs(List *sgClauses,
  						List *targetList);
+ extern void get_grouping_expressions(PlannerInfo *root, PathTarget *target,
+ 									 List **grouping_clauses,
+ 									 List **grouping_exprs, List **agg_exprs);
  
  extern SortGroupClause *get_sortgroupref_clause(Index sortref,
  						List *clauses);
*************** extern void split_pathtarget_at_srfs(Pla
*** 65,70 ****
--- 68,84 ----
  						 PathTarget *target, PathTarget *input_target,
  						 List **targets, List **targets_contain_srfs);
  
+ /* TODO Find the best location (position and in some cases even file) for the
+  * following ones. */
+ extern List *restore_grouping_expressions(PlannerInfo *root, List *src);
+ extern List *add_aggregates_to_target(PlannerInfo *root, PathTarget *target,
+ 									  List *aggregates, RelOptInfo *rel);
+ extern Index get_expr_sortgroupref(PlannerInfo *root, Expr *expr);
+ /* TODO Move definition from initsplan.c to tlist.c. */
+ extern PathTarget *create_grouped_target(PlannerInfo *root, RelOptInfo *rel,
+ 										 Relids rel_agg_attrs,
+ 										 List *rel_agg_vars);
+ 
  /* Convenience macro to get a PathTarget with valid cost/width fields */
  #define create_pathtarget(root, tlist) \
  	set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
new file mode 100644
index 9f9d2dc..e05e6f6
*** a/src/include/utils/selfuncs.h
--- b/src/include/utils/selfuncs.h
*************** extern double estimate_num_groups(Planne
*** 206,211 ****
--- 206,214 ----
  
  extern Selectivity estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey,
  						 double nbuckets);
+ extern Size estimate_hashagg_tablesize(Path *path,
+ 									   const AggClauseCosts *agg_costs,
+ 									   double dNumGroups);
  
  extern List *deconstruct_indexquals(IndexPath *path);
  extern void genericcostestimate(PlannerInfo *root, IndexPath *path,
diff --git a/src/test/regress/expected/inherit.out b/src/test/regress/expected/inherit.out
new file mode 100644
index 6163ed8..7a969f2
*** a/src/test/regress/expected/inherit.out
--- b/src/test/regress/expected/inherit.out
*************** select tableoid::regclass::text as relna
*** 625,630 ****
--- 625,652 ----
  (3 rows)
  
  drop table parted_tab;
+ -- Check UPDATE with *multi-level partitioned* inherited target
+ create table mlparted_tab (a int, b char, c text) partition by list (a);
+ create table mlparted_tab_part1 partition of mlparted_tab for values in (1);
+ create table mlparted_tab_part2 partition of mlparted_tab for values in (2) partition by list (b);
+ create table mlparted_tab_part3 partition of mlparted_tab for values in (3);
+ create table mlparted_tab_part2a partition of mlparted_tab_part2 for values in ('a');
+ create table mlparted_tab_part2b partition of mlparted_tab_part2 for values in ('b');
+ insert into mlparted_tab values (1, 'a'), (2, 'a'), (2, 'b'), (3, 'a');
+ update mlparted_tab mlp set c = 'xxx'
+ from
+   (select a from some_tab union all select a+1 from some_tab) ss (a)
+ where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
+ select tableoid::regclass::text as relname, mlparted_tab.* from mlparted_tab order by 1,2;
+        relname       | a | b |  c  
+ ---------------------+---+---+-----
+  mlparted_tab_part1  | 1 | a | 
+  mlparted_tab_part2a | 2 | a | 
+  mlparted_tab_part2b | 2 | b | xxx
+  mlparted_tab_part3  | 3 | a | xxx
+ (4 rows)
+ 
+ drop table mlparted_tab;
  drop table some_tab cascade;
  NOTICE:  drop cascades to table some_tab_child
  /* Test multiple inheritance of column defaults */
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
new file mode 100644
index 568b783..cd1f7f3
*** a/src/test/regress/expected/sysviews.out
--- b/src/test/regress/expected/sysviews.out
*************** select count(*) >= 0 as ok from pg_prepa
*** 70,90 ****
  -- This is to record the prevailing planner enable_foo settings during
  -- a regression test run.
  select name, setting from pg_settings where name like 'enable%';
!          name         | setting 
! ----------------------+---------
!  enable_bitmapscan    | on
!  enable_gathermerge   | on
!  enable_hashagg       | on
!  enable_hashjoin      | on
!  enable_indexonlyscan | on
!  enable_indexscan     | on
!  enable_material      | on
!  enable_mergejoin     | on
!  enable_nestloop      | on
!  enable_seqscan       | on
!  enable_sort          | on
!  enable_tidscan       | on
! (12 rows)
  
  -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
  -- more-or-less working.  We can't test their contents in any great detail
--- 70,91 ----
  -- This is to record the prevailing planner enable_foo settings during
  -- a regression test run.
  select name, setting from pg_settings where name like 'enable%';
!             name            | setting 
! ----------------------------+---------
!  enable_bitmapscan          | on
!  enable_gathermerge         | on
!  enable_hashagg             | on
!  enable_hashjoin            | on
!  enable_indexonlyscan       | on
!  enable_indexscan           | on
!  enable_material            | on
!  enable_mergejoin           | on
!  enable_nestloop            | on
!  enable_partition_wise_join | off
!  enable_seqscan             | on
!  enable_sort                | on
!  enable_tidscan             | on
! (13 rows)
  
  -- Test that the pg_timezone_names and pg_timezone_abbrevs views are
  -- more-or-less working.  We can't test their contents in any great detail
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
new file mode 100644
index 1f8f098..2d14885
*** a/src/test/regress/parallel_schedule
--- b/src/test/regress/parallel_schedule
*************** test: publication subscription
*** 103,109 ****
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass
  # ----------
  # Another group of parallel tests
  # NB: temp.sql does a reconnect which transiently uses 2 connections,
--- 103,109 ----
  # ----------
  # Another group of parallel tests
  # ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock json jsonb json_encoding indirect_toast equivclass partition_join multi_level_partition_join
  # ----------
  # Another group of parallel tests
  # NB: temp.sql does a reconnect which transiently uses 2 connections,
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
new file mode 100644
index 04206c3..9ac24dd
*** a/src/test/regress/serial_schedule
--- b/src/test/regress/serial_schedule
*************** test: with
*** 179,181 ****
--- 179,183 ----
  test: xml
  test: event_trigger
  test: stats
+ test: partition_join
+ test: multi_level_partition_join
diff --git a/src/test/regress/sql/inherit.sql b/src/test/regress/sql/inherit.sql
new file mode 100644
index d43b75c..b814a4c
*** a/src/test/regress/sql/inherit.sql
--- b/src/test/regress/sql/inherit.sql
*************** where parted_tab.a = ss.a;
*** 154,159 ****
--- 154,176 ----
  select tableoid::regclass::text as relname, parted_tab.* from parted_tab order by 1,2;
  
  drop table parted_tab;
+ 
+ -- Check UPDATE with *multi-level partitioned* inherited target
+ create table mlparted_tab (a int, b char, c text) partition by list (a);
+ create table mlparted_tab_part1 partition of mlparted_tab for values in (1);
+ create table mlparted_tab_part2 partition of mlparted_tab for values in (2) partition by list (b);
+ create table mlparted_tab_part3 partition of mlparted_tab for values in (3);
+ create table mlparted_tab_part2a partition of mlparted_tab_part2 for values in ('a');
+ create table mlparted_tab_part2b partition of mlparted_tab_part2 for values in ('b');
+ insert into mlparted_tab values (1, 'a'), (2, 'a'), (2, 'b'), (3, 'a');
+ 
+ update mlparted_tab mlp set c = 'xxx'
+ from
+   (select a from some_tab union all select a+1 from some_tab) ss (a)
+ where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
+ select tableoid::regclass::text as relname, mlparted_tab.* from mlparted_tab order by 1,2;
+ 
+ drop table mlparted_tab;
  drop table some_tab cascade;
  
  /* Test multiple inheritance of column defaults */

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Antonin Houska (#8)

Re: Partition-wise aggregation/grouping

On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska <ah@cybertec.at> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Apr 26, 2017 at 6:28 AM, Antonin Houska <ah@cybertec.at> wrote:

Attached is a diff that contains both patches merged. This is just to

prove my

assumption, details to be elaborated later. The scripts attached

produce the

following plan in my environment:

QUERY PLAN
------------------------------------------------
Parallel Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Parallel Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Parallel Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2

Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce,

The point is that the patch Jeevan wanted to work on is actually a subset
of
[1] combined with [2].

Seems like, as you are targeting every relation whether or not it is
partitioned. Where as I am targeting only partitioned relations in my
patch.

except i have no idea what "Parallel" would mean in a plan that contains

no

Gather node.

parallel_aware field was set mistakenly on the AggPath. Fixed patch is
attached below, producing this plan:

QUERY PLAN
------------------------------------------------
Finalize HashAggregate
Group Key: b_1.j
-> Append
-> Partial HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Partial HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2

With my patch, I am getting following plan where we push entire
aggregation below append.

QUERY PLAN
------------------------------------------
Append
-> HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
(15 rows)

Antonin, I have tried applying your patch on master but it doesn't get
apply. Can you please provide the HEAD and any other changes required
to be applied first?

How the plan look like when GROUP BY key does not match with the
partitioning key i.e. GROUP BY b.v ?

[1] /messages/by-id/9666.1491295317@localhost

[2] https://commitfest.postgresql.org/14/994/

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are
not the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

#10

Antonin Houska

ah@cybertec.at

over 8 years ago

In reply to: Jeevan Chalke (#9)

Re: Partition-wise aggregation/grouping

Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska <ah@cybertec.at> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

Well, I'm confused. I see that there's a relationship between what
Antonin is trying to do and what Jeevan is trying to do, but I can't
figure out whether one is a subset of the other, whether they're both
orthogonal, or something else. This plan looks similar to what I
would expect Jeevan's patch to produce,

The point is that the patch Jeevan wanted to work on is actually a subset of
[1] combined with [2].

Seems like, as you are targeting every relation whether or not it is
partitioned.

Yes.

With my patch, I am getting following plan where we push entire
aggregation below append.

QUERY PLAN
------------------------------------------
Append
-> HashAggregate
Group Key: b_1.j
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> HashAggregate
Group Key: b_2.j
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2
(15 rows)

I think this is not generic enough because the result of the Append plan can
be joined to another relation. As such a join can duplicate the
already-aggregated values, the aggregates should not be finalized below the
top-level plan.

Antonin, I have tried applying your patch on master but it doesn't get
apply. Can you please provide the HEAD and any other changes required
to be applied first?

I've lost that information. I'll post a new version to the [1] thread asap.

How the plan look like when GROUP BY key does not match with the
partitioning key i.e. GROUP BY b.v ?

EXPLAIN (COSTS false)
SELECT b.v, avg(b.v + c.v)
FROM b
JOIN
c ON b.j = c.k
GROUP BY b.v;

QUERY PLAN
------------------------------------------------
Finalize HashAggregate
Group Key: b_1.v
-> Append
-> Partial HashAggregate
Group Key: b_1.v
-> Hash Join
Hash Cond: (b_1.j = c_1.k)
-> Seq Scan on b_1
-> Hash
-> Seq Scan on c_1
-> Partial HashAggregate
Group Key: b_2.v
-> Hash Join
Hash Cond: (b_2.j = c_2.k)
-> Seq Scan on b_2
-> Hash
-> Seq Scan on c_2

[1] /messages/by-id/9666.1491295317@localhost

[2] https://commitfest.postgresql.org/14/994/

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Antonin Houska (#10)

Re: Partition-wise aggregation/grouping

On Fri, Apr 28, 2017 at 3:03 AM, Antonin Houska <ah@cybertec.at> wrote:

I think this is not generic enough because the result of the Append plan can
be joined to another relation. As such a join can duplicate the
already-aggregated values, the aggregates should not be finalized below the
top-level plan.

If the grouping key matches the partition key, then it's correct to
push the entire aggregate down, and there's probably a large
performance advantage from avoiding aggregating twice. If the two
don't match, then pushing the aggregate down necessarily involves a
"partial" and a "finalize" stage, which may or may not be cheaper than
doing the aggregation all at once. If you have lots of 2-row groups
with 1 row in the first branch of the append and 1 row in the second
branch of the append, breaking the aggregate into two steps is
probably going to be a loser. If the overall number of groups is
small, it's probably going to win. But when the grouping key matches
the partition key, so that two-stage aggregation isn't required, I
suspect the pushdown should almost always win.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Jeevan Chalke (#1)

1 attachment(s)

Re: Partition-wise aggregation/grouping

Hi,

Attached is the patch to implement partition-wise aggregation/grouping.

As explained earlier, we produce a full aggregation for each partition when
partition keys are leading group by clauses and then append is performed.
Else we do a partial aggregation on each partition, append them and then add
finalization step over it.

I have observed that cost estimated for partition-wise aggregation and cost
for the plans without partition-wise aggregation is almost same. However,
execution time shows significant improvement (as explained my in the very
first email) with partition-wise aggregates. Planner chooses a plan
according
to the costs, and thus most of the time plan without partition-wise
aggregation is chosen. Hence, to force partition-wise plans and for the
regression runs, I have added a GUC named partition_wise_agg_cost_factor to
adjust the costings.

This feature is only used when enable_partition_wise_agg GUC is set to on.

Here are the details of the patches in the patch-set:

0001 - Refactors sort and hash final grouping paths into separate functions.
Since partition-wise aggregation too builds paths same as that of
create_grouping_paths(), separated path creation for sort and hash agg into
separate functions. These functions later used by main partition-wise
aggregation/grouping patch.

0002 - Passes targetlist to get_number_of_groups().
We need to estimate groups for individual child relations and thus need to
pass targetlist corresponding to the child rel.

0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor
GUCs.

0004 - Implements partition-wise aggregation.

0005 - Adds test-cases.

0006 - postgres_fdw changes which enable pushing aggregation for other upper
relations.

Since this patch is highly dependent on partition-wise join [1]/messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com, one needs
to
apply all those patches on HEAD (my repository head was at:
66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
order.

Suggestions / feedback / inputs ?

[1]: /messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com
/messages/by-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com

On Tue, Mar 21, 2017 at 12:47 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

Hi all,

Declarative partitioning is supported in PostgreSQL 10 and work is already
in
progress to support partition-wise joins. Here is a proposal for
partition-wise
aggregation/grouping. Our initial performance measurement has shown 7
times
performance when partitions are on foreign servers and approximately 15%
when
partitions are local.

Partition-wise aggregation/grouping computes aggregates for each partition
separately. If the group clause contains the partition key, all the rows
belonging to a given group come from one partition, thus allowing
aggregates
to be computed completely for each partition. Otherwise, partial
aggregates
computed for each partition are combined across the partitions to produce
the
final aggregates. This technique improves performance because:
i. When partitions are located on foreign server, we can push down the
aggregate to the foreign server.
ii. If hash table for each partition fits in memory, but that for the whole
relation does not, each partition-wise aggregate can use an in-memory hash
table.
iii. Aggregation at the level of partitions can exploit properties of
partitions like indexes, their storage etc.

Attached an experimental patch for the same based on the partition-wise
join
patches posted in [1].

This patch currently implements partition-wise aggregation when group
clause
contains the partitioning key. A query below, involving a partitioned
table
with 3 partitions containing 1M rows each, producing total 30 groups showed
15% improvement over non-partition-wise aggregation. Same query showed 7
times
improvement when the partitions were located on the foreign servers.

Here is the sample plan:

postgres=# set enable_partition_wise_agg to true;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
------------------------------------------------------------
--------------------------------------------------
Append (cost=5100.00..61518.90 rows=30 width=12) (actual
time=324.837..944.804 rows=30 loops=1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=324.837..324.838 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p1 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=309.954..309.956 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p2 plt1)
-> Foreign Scan (cost=5100.00..20506.30 rows=10 width=12) (actual
time=310.002..310.004 rows=10 loops=1)
Relations: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370 ms
Execution time: 945.384 ms
(9 rows)

postgres=# set enable_partition_wise_agg to false;
SET
postgres=# EXPLAIN ANALYZE SELECT a, count(*) FROM plt1 GROUP BY a;
QUERY
PLAN
------------------------------------------------------------
------------------------------------------------------------
---------------
HashAggregate (cost=121518.01..121518.31 rows=30 width=12) (actual
time=6498.452..6498.459 rows=30 loops=1)
Group Key: plt1.a
-> Append (cost=0.00..106518.00 rows=3000001 width=4) (actual
time=0.595..5769.592 rows=3000000 loops=1)
-> Seq Scan on plt1 (cost=0.00..0.00 rows=1 width=4) (actual
time=0.007..0.007 rows=0 loops=1)
-> Foreign Scan on fplt1_p1 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.587..1844.506 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p2 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.384..1839.633 rows=1000000 loops=1)
-> Foreign Scan on fplt1_p3 (cost=100.00..35506.00 rows=1000000
width=4) (actual time=0.402..1876.505 rows=1000000 loops=1)
Planning time: 0.251 ms
Execution time: 6499.018 ms
(9 rows)

Patch needs a lot of improvement including:
1. Support for partial partition-wise aggregation
2. Estimating number of groups for every partition
3. Estimating cost of partition-wise aggregation based on sample partitions
similar to partition-wise join
and much more.

In order to support partial aggregation on foreign partitions, we need
support
to fetch partially aggregated results from the foreign server. That can be
handled as a separate follow-on patch.

Though is lot of work to be done, I would like to get suggestions/opinions
from
hackers.

I would like to thank Ashutosh Bapat for providing a draft patch and
helping
me off-list on this feature while he is busy working on partition-wise join
feature.

[1] /messages/by-id/CAFjFpRcbY2QN3cfeMTzVEoyF5Lfku
-ijyNR%3DPbXj1e%3D9a%3DqMoQ%40mail.gmail.com

Thanks

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#13

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Jeevan Chalke (#12)

1 attachment(s)

Re: Partition-wise aggregation/grouping

On Wed, Aug 23, 2017 at 4:43 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

Hi,

Attached is the patch to implement partition-wise aggregation/grouping.

As explained earlier, we produce a full aggregation for each partition when
partition keys are leading group by clauses and then append is performed.
Else we do a partial aggregation on each partition, append them and then
add
finalization step over it.

I have observed that cost estimated for partition-wise aggregation and cost
for the plans without partition-wise aggregation is almost same. However,
execution time shows significant improvement (as explained my in the very
first email) with partition-wise aggregates. Planner chooses a plan
according
to the costs, and thus most of the time plan without partition-wise
aggregation is chosen. Hence, to force partition-wise plans and for the
regression runs, I have added a GUC named partition_wise_agg_cost_factor to
adjust the costings.

This feature is only used when enable_partition_wise_agg GUC is set to on.

Here are the details of the patches in the patch-set:

Here are the new patch-set re-based on HEAD (f0a0c17) and
latest partition-wise join (v29) patches.

0001 - Refactors sort and hash final grouping paths into separate
functions.
Since partition-wise aggregation too builds paths same as that of
create_grouping_paths(), separated path creation for sort and hash agg into
separate functions. These functions later used by main partition-wise
aggregation/grouping patch.

0002 - Passes targetlist to get_number_of_groups().
We need to estimate groups for individual child relations and thus need to
pass targetlist corresponding to the child rel.

0003 - Adds enable_partition_wise_agg and partition_wise_agg_cost_factor
GUCs.

0004 - Implements partition-wise aggregation.

0005 - Adds test-cases.

0006 - postgres_fdw changes which enable pushing aggregation for other
upper
relations.

0007 - Provides infrastructure to allow partial aggregation
This will allow us to push the partial aggregation over fdw.
With this one can write SUM(PARTIAL x) to get a partial sum
result. Since PARTIAL is used in syntax, I need to move that
to a reserved keywords category. This is kind of PoC patch
and needs input over approach and the way it is implemented.

0008 - Teaches postgres_fdw to push partial aggregation
With this we can push aggregate on remote server when
GROUP BY key does not match with the PARTITION key too.

Since this patch is highly dependent on partition-wise join [1], one needs
to
apply all those patches on HEAD (my repository head was at:
66ed3829df959adb47f71d7c903ac59f0670f3e1) before applying these patches in
order.

Suggestions / feedback / inputs ?

[1] /messages/by-id/CAFjFpRd9Vqh_=-Ldv-
XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

partition-wise-agg-v2.tar.gzapplication/x-gzip; name=partition-wise-agg-v2.tar.gzDownload

����Y�]{s���������D^���a'w����4�����v�;;��,6��������x�"($����F�s��EA�F����wC����������	���~��@�L��i����`I:c8����]6�aF(��?1�CKu���_h���?i������E8�����������(�Ee�����7&�pdf��F��0x
|���7`h�6�6����6�B�0���c�"�vF�W	.�-0&`0x����a��y
~a2���������6��/#�Dy�o�7��yMi�4�����"���kc�z0���9�\��?���������`�����&`�	�n��nA7;����{�&�'��}�pKhbg�uFr��U��`xv�AB��A���Yh/���0X���?���q�"Y�8�d��_�
�I�:4�;`���C@�`y��0�d�%)��
��;$8���{wo������D��pXE|z��5�tw>{{�n�~�6����,�=���c�]:���Q�oN&c�FT}������+���V,��[�7�
���cj��mg�h$�|<���p���lv��=��"DU�L"�o�+&Y�q���K�]������#i�9ZypC��!M��;���+�Y�YZ��V\��(X�7��K,p�N3��0���������9}-P�1����b�(�[-������A�������Ami�lu�<R�8/H:9�#Z���
z��;��|�A8����4�s�����j=dy���������.�GWSpu�����"r ?�."]7���d| �������p����k��Q���B�������������%�p��hox����?a�i�*���cz��sN�����bz�����c�������ILZjM�rz:}w\�#��z��$��?##i���
H�������$'��m_����3:���Kr���������N����&�(�8�X(�5�����K�����G����V�;�PdF��{7Z�����>���9dv�}�zNF����j}gw�O��h�^�t~9��w��W��Nvh�`��{ \�t���Z;4�������p�%��~>����{���#8�_O/��a����������f*�����D���Ut�"XFh�>V�JK�C;Dd]*
1�b���ZFD�;�"%
���dt�����-^:��i��(\%�*'E��esM��L���.9��u+�W.���\�:_1�C����������|)�~�����]W���}��B���T9���~m*e��2�H��2��2lG�P[�0+e��2�J6�����)��YC�����v�ljK��J��J��J��R�B�>$l=�$`j���Pb�dZ��o����+��v���N�$/d��?��57�I
��hR�$)�IG���$�8MZ���&�h1p�V����8mG�X�QEh�Pj] �XV�B����Py������� 	����J4�x��]"������K�F/
���$5���?}��X�WXm���N��O~��!�-�:�������Y�Z�����p�-�C��a��a%���2��e�Bq���om��T��,��YI�fE���e��t��/�rf�ms�EU��&��p5P��#@!�1f�(LH�qP��xB�&9,�d<��"2�����ne����1t+chV�K�������9��<���s�j�>����M���|��J����M�^��9�B��1�u��q�w��q���K�4N>������6��UV�����
wb����r��B�F���*����UT���QJX���	\���+�JM����(�w_���+m�K�W�1���M��Ks�J'&U#���;���F���pv���������&H|e%�CC&���V�
 �F�kX-b�mM�����T�5�zn�7��j�T]�Gr��*6��S��t��]�<Tt��2��Su�&1�m�Vt���j\��9��Z�O�}������i~��/1��0Z�n;����;��(~�����_����e��G��h�����?J��_���v��
��pC�Z�zQw�C�Z����)6�K�1=XHO��[B��L5=�DVP)+XGVP)+	9
YA��$������7���4��x�Vw��]FV%B��N�Q�oIO�	���Z�Z:��N��84M6	�1b��v����<�@w�}:=w�[��4��E"��La���^L�=�������B:3N�t��<_�tJ�k�*/��A+];���I��Z���?�	�����k
��'�)JDE�IE�|�E�"���4]I�8�d�.��ZE^���
w���Eg}�f��h���%�}^*-Q��z�/���+4�
U��K6_Ez+{��%���%���9�="�KQbM�	�0��
���0��>���Y����zY�����dE��J�B
�d*i�F��t
:����'T����Lt���a�2uu����:�|1�j:�J�9��i�����V�����K>i��}�Q�e"D13�7�2Y�)F�,F&�!�:� ���l13�7�bY��QmB�WD���a?=�l7���/���"������=�_���0��uG�|L��C�,I[m�/�T��yx�t1.fWt����-�,�Iq��W�8�9���e]rZM��:�LJ�~>�T���th+��[b�^��I!�YS�-��C�����HT��(�Q>B���5#-4��x��I��Y���J��<��b]*��x�F)C)[�F��,��1�h�r��@�x�aC���C��*4�`����8�do�-�Zk>*A����Jj�;e���c���:2]�����f�R5y��t�%r���Jux�aC���J�UhX���R3���8�8��!��Od��'t<~�������?t:=����������bz������H�K�n�Y�8WT�}w�u��}��Xn����g��byh��\��&�?���lzc��3���X�����:^�����r���7�E?��ud���46�EV<2T�=##_���-K�|��A���A��&#|=���|��0�J�Afz^�Y��\� :f���F/���Ag�`
�kB��b�w����<�*�@Vv�^��53����j���V����R�>�?b��*7���nX��~	�:�1�`�Uf�N�,
�:��UC������7P��������M����o`�oJ��Z�r��j9�����P��2���MUi���1�����������6�w�&�`Z	&�`J	&�`:	&�`*	&�`	&�s��|�S����[)�[2[3QR&�c��X'
�R:��/�;JW�k�e�n)~���;����� �!�m=���FJ���I����2��F���(.�����f*�Hru����a�6�Y9&�Q�rx������J�4
��-��W_�
����X���P[���E��h���V�R<�^��r����^�'B:�r�2�s��)m�`�����\J���^J��(`_J���`��w2W���Q����E7<���������Y-��������1�
�-;���s��r-�(>��P�D�����g�K�So�E��{a~j��{E�Ymi:���_������6g���Ro�=��{�^j%���M�Ymi:���l�~y�����/�K���c��z�S�e����:6�Yf�Y?�2��(�s&��F��Y��;{e�5&Py�	T^d�7�(2�Be!���o_���f�A��������n���K���A�u�]�k����f�m��������(�-D�����F)�/�
��zW3�1�Q�o
���!�k�f���0��x2�����G7� s�h��0w��q�����w�]�V*��J��������t���M�q������nQ����)��KT�~��/��)[�H����T���g|a��\��W��!��WM6��W�R���RT�����^���
	�7�,����A�bLXB�u���F��B�u��dU�8d	���%�,��{qX^~����2���qye��N,���@����-u[���l��1�.)Q�J	�T���E�I!
�fr}�������������(�,��C��,���n���|
([��bP�;JQ�`�^	L6&O���n��e���������E�	78�[n�BIP�5a;�Y��z�*hc�:�i�[����Y����f��47E�$���f;���
1jU�I�V�t��{k�0Pd���lH���	\&�}s#��p�Jgy�a!E�p��
3r�l7�9fX��"K8d�P.9E��<��)������:sN�m�����yB�8��f&d[�F��X�0�wUN1^�(��%�)C�0<��z~
���>��-��<���#Y���7��ChX��u�7�!�.#_�������K�V���.�.s�6D�e�K�x���>�x�iSo����NS����m����7^v����A��g�z��'���<����Nk��Mo��cliZS+-7-xZ��6�����j���+�
V�/��[�����Z��
�wy��:{���wy
B{�opvsO����(p�C����s7 ������?(6������Q�J�|�Oe>�pyE@q;�~��I��XP�Y���P5�^Q�|���xULU!!US5������vST�)�85CwS���3{��.�dEjz�+��N�mV[�y��@����*��s<�zZ��)���a�.�T^���To��/h�xr@{-�c2�}8#�R`��3�K�(��KnL�b��q��@K�$}����+������j��2���Q~��[�T�������*KU]x���9g�}�Hj�YT.��j(���X��P�BG)Dc��B����.P6c���5H�X�M���Zzo��{������{3�fN	��z��[�bzo����U�g����9�7szo�z�5�Wo����e�8��^���6B�G�'n;4�����	�L����)eEeEeE��"`Y����a����F���
�����l���e��%�H���]�8�����������-����g0P�s����7�y~Bl�`E<s)�V����
V�W
b����&�-C�<@�F���Mo����S4}3�Gm
��j�c4D����f5�N- y"�.f��b2 CcL1��s,���^��C7v
�Cwp
Gb��5<c���!���a��C���&BL�5�d!�����8�0�{�?���`�
�2I�����9hK�������b�|w�J"= T�Fk[����h
�>?l6	k�~5����k
B�L��+n�l�k��0{���bz!c��M1��p(�����:���C�^8c���:��C!fs�})�Yt�\�aQ?����+���T�^�����"�z&����^�YY�.����ZL�54���D�j�M�~~��u�����+8UV�4r���i�6e���:�W�������s-��lW�{U�)�&�w'�~S�ks{t�����mF�>����eI�^t�)�?�l?�fsdG~�����9���v#������p��	'�a�Or#����n�vV�F���t���yh���C9#�6����}!V�����^yo��=c`�&`����u<,q�:\Y����YZ�/@����������lJ�� )�Q
72�q�!����\|�[2; /����~�����3~$v�*$�p0���~�+Xnt�n����7�o��B�{A>���5E���K��7��+2��z oF��9wn��3��?�?CR]������B}���'4��r�(���r�l���'T�6���C����-�Z�%����Nq�;�Qq�H�/d[�����p������ID��6���u0���o��a��H�t�?�^���5��0B���1#�D���-��o?��Ob�5�%�,�=��D�7���� ��1���s����|��\y�RP�b����E����!���~��w7;z�>]y)xR�42Y�	9=��F	�^�Zc=��7G+/�7��N�B����H���;�/qT2�����A��h�"p��5��Y�08{(%�y(!L�Go
*K��AmY��T��#������E���A��t��q@?�H��A����7�����<Wl�u��y�
��z���SLO���8��?��?���h`J�cz��sN
����?���O#���O����1s����s6I�g:o�t��^VX0�+���i#{������D�5�t��[M��V�H����������\������kQ�b�8u�SiTM�[,!1��4&������1Q��8i(\���L%t��H
��/aO���;'���;�K��=V�7O~�:.����e��G!�t������3���F���aH�	g��)���p��RZ����RZ����RZr��JiI�mJK��5���kV�(�dD`U"D����#2���%I�|�`�7oh�P�6x$�84M6	�1b����{�Lt��32�y+j��{�����$
�W�Wt|���y���Y�-���e�$O�yw��IP��"��������t7��J�'�1gbR@_������Cx&R������X����xF"���o�k�r_���z����DN4UyN#	��i[��'u!"������.�{P�j@j�����lk�L�C�����:4��1o={G�������������[�����r}���
�9~uq@�}9�"����i#����AiZ�2����/��vK�b=������}��4�)�m������E;�g����{�g��f�[f�o����_5��?Y����?W��/��?M �/��?F{���~� �5�����$�P^W�b~��4��(�������������-��}~5�~(3h�c������W*����c+�j��G����� �j�`u��F��^���`�����b2 ET_������N���^��0��U��������v����j��n���h|	l"����0����m{�������h2	{���:�q����������~8�t�I4�FG��|.QNn4���l����C�#��*����d�����|��+����>Z\?/�>����������=���?��*�v�Q:��'���?�x�Z����M���zJ�fy�^*�_8K�&�w�X�k�aSDa]����������t6�-{�*�U���&�S�p�O���Nu����7wG����d����2�	���%�%�QX�@L���j��8ih�G��������h:��p0�@b�������0����%y����u��0���K����=6��E�_��*&��A�>�rN���]:�]�/�����?,TL�i�'c�������u��om(6��Z����>���W���������*\��h��?�_������N=;�GUY>Z��������������|������Q0�����y$'���l�r�'��`41^l��t"���������4���j���;a<��l����^�\i�c<�pzj��K��|���a�	F
pO1�Z�������~����,*�3>�\�Oj��
6[�_����f�S��3�KB<'�K
"���Ak�h;�����NH'�>S��9$a{J��	���*��
W��l�Y���r#�@���@Q������zJ����d�hq���t.)�p�^����U~�f��:�H^<��p��j�\-`8H`�����g�NRU�$��#;H�h�s��P�$~����Fp�FO�@����x�G����l�j1��g���x�L�����j���(t(��St&��NDY��3����~�(���
�w�]�v#v�"i�9q���Z��,WD�Dqd����q�5P���4��E�w���\���
|����x���&}9�=����Lq��M>�&5w������<U��"��t�KU�F>?��q\$���}�����`��;�Wj�����S������g;t\��w��O�����x�U�>�u�<W�@�G������|�
�aM*���
.�50'�M�����*�0	j����a\�8X����7����x)�}���t��0�h����e���l�a!��^��k���8�(�^�����;��Xq�1��9��(1
a'5���a`�Z�&M��UOb�2L�/������j�`���c�6��%B�QK,L���:���{�d<j�Ak�lL�7�Xn"x�0����}�T��� ���-�7|}�������fs�������a��u;�F���v�f���4<�k����g��_��U��fY��6��
/C����O�)L�$��IM`{������H� ��K,�8�^M�fk��u�Td��-P��	� 7[O�H4>�~[8dj-Z;����#��d�#?d�.�E�z}Ah�,��|���;�<i~��[
(	����{�z=������%�-M�����k�^��
~��(�|��QtU>E'�'s�W7��sr>���������M�r�F�� X��}������f5W����!8���g2^�:>�UHx�z8k�[$������3q������\�O��s,��zDg9	�����K�h/��o�A�]���c]����+B��(��*���@~Q������.��m!OV��>/�\�#����_r��;^LX�1M/��s@��%2*{�������D9n�X����W|t���O���)0���+��(\����w��)��p=K/�$�����h��'\9��9EK�xJiq2��h+�p-�z�^�����.V�Y�YOg��%��!�Yw������vpi;������4���r\�1d&��t\U�����c�x���U�����O�r4���d=��5��"_���:aY������D	����Fp-���O�F+�vt'�	�>�2Hmf����Z�V��N��
���&sH~v%D�n���v@��mn�� E��
xW�2�.K���e@*�Lwu��=d���c��1��sM������M�o|��K"�
�� ��^�:������fC�v�����i��5�\�����?��G���t����p8�D#���4�	]���k���D�~�OIF���"-N���X`=�L���v���F��5����W\�8��/��I���q;F��#���?��LX���S��X�G�Y�m�h������4Ih����Ir@�y���z��@2����'Q<cT� s�'�b0����{,eA�qn�t>���C��JZ�����Y<?o�����u��{.��������t���y��k�,^3o���-^s��k�s���.^3�x�]<N��H����9���������O�M���7W���R��^��H|tt^�tX#�z���g�����.��1?��tl��c����+:F����c����/:F���E��,:�f�16���Ut���clc��]��M��%D}�{6�t.��I��3��$M7�9�����g��1:�\BR]vpI��e7�"��t��&xg�\�V�o�������'.��=D���]\�������|]`q��Y\������Y�f��6.n3q��,n�a���X����e���P�%�`9������U��k[��J��(���v��J��������E�����sX�"s�a$��=��,2�����sX�"s�aP��=�m-2�f���sX�"s�a|��=�.2�&���sX�"s�a���=���>wSy�s���	�8Y�,Kb�� �W���O~8��������Uk��'�^�.%/U'����O��^��w�.U����&J��i��!sY�
���x��h���"[�n��'�90Jt���������'��d!����@z<=TM��W��)Z���W�G�_�KB���h^�i��i�s��CW#b�7��h����i��vWY���~0�^�>B;�^X`����,��<��hw�%����ss��-�����/z�Y/F��,c�%;��Iu[�L��N��w�f5 �5r5��?�����v�����j��E>d�=w��x�i4�M�����N�������~��t��c����nz�C����-�	��M�k����z�Df��d�tl��$��'-�kw2�����7�����w�'�~�H��De4���x�+�A�8K��p#�_�+E��{8[Ys:�6��@2�0�`9��hL�I]v�f�>,H�����l���F��f�??/�%�AF��N������
Vo�)b�y�9Y���������eU�R���FI306g��F�@@������u:��q�=\W�7>���� ���hf�p���A9"����r�tK@9�6�P�����Qx�Ff��g��o�}�~��Y�x��}<��x�o�D���K���^�L��"O�L�@���b��]�{���E�����u(]�q3r���;�������[�8�&%'���H����q���S��=���i�S,�;r�t�CP�yG[��cc�)����7u?dSY����m��a*�K���������J��K�z����2��2�Z5e�@k��o?><�o��p���
��m��Y�;Xi���h�K�\���
���!�&M'b��!�����=�&
b�0�2��fn�m��	`j(S�������0��g�<�MQ���&�e�������{m�\i�.�
��I~���
D�D�
D�!��g� @���. 6�@l>$�������X�:)��C�TtA�d*4��M��V@��3����>�'�x����}��7������o����i�7-���9/s^���x�#�3G�g�����9?k���}�=��d�����r9Y4��������~�����������g���x2|0��K��3���!a��C�+�^Y���)H����,9��K�����
�?!Y�s���?�.��-��O����?��-�[�p�������,^�Z�~�|H$�!n�P�������"���j`_�����|a������M��6c/l�^�4��zas�����M��6/l^����Qwa������-���f|��[��.[M�������In>�������^����n������V����M��BJ3/�1<����s���3�IT���,���������)
�N��o�24Jd���\�t|b#�pF�8���c=F3���^'r(	D,����l�YHd�;�����kY@%��Y>Y�Pe@?
o#4��l���V���I����N2I�NFf�|�&
(/��e�F�\[Y.g�6��a���|5�QH������6_��\�:\]��_�W�r�N�2������:$:E='�4�k@�����l��g���CY�X�����=1���v��D	������z����
�~������&�. OiPGv� �w�_����������U�^,f�
G�14����)��������q2�^��OO��x���Q<��[5�}x�~�D����9-��c����&35�:�O�����)*�p�W������p�ub�b2���������.>`R_�����~}��H$[/�*-VcN����`N��b����1k��6	=�I� ������!��?�
��:�������8����[��P�J�7s^�n�UE]����^�K�h:����e��yM?`l�:o�n-��f:����u>�"��(���9���#@{o@c��H�x����u�����Rc�=�W�)��'�w-�Z�F�=�d��{�lP��N�E��I��/(����\3�k�(i5�$w�P��t��y�������k��`~���of��c��+����9Y����������z��5��yn�6��/���g��Eh�F�V�������{��j���X�������q5���!K��w�Z�#�M���V�|�G����4z>�3�v�����f�r ������ZX�QM�2��'��E*�pC~������AKqi���L�X[�!{���Z����E��]�F�~�{HW��o�H-���-�>�w`x����#����k��p3x�~��BB�(��W�����t���n>�[�m����?rw���.\����������%��C	W�;�LW���+,�11�7�����W5L�1��"�U��Q�i2/��?�r�%Y=sN:#�%���cL�����j6�?������bM�>�Q{uUH�t%�W�^o�h4��p�5�IcW��Ua��aI���
�R���H�89V��Z������"�W��^H���Z2 "z�VRoeE;�g�"���7C�	O���-�������H�~
'-�h"s,��	6V��B���p����2���
m�u���;������-B;@����KAM��HF����1���%�Z��N�%=�3���{ ������W������W��?���������G��ln4�V_��$ ��6^�tA���EC����B�0����[�C�Ic�����m��P�4�3��U��
����l��P��4�TeC�Zi
��J�
|-���|*f��B�LB�s��T'^��e���M��t����H^h�n�v��t���?��x���2�{3��#�
xm�
� 7a���roz��a]��1��`}U��9@�l0
����1��hc1s6�.1#����-2�.9��������f����"�B���}L�
�-h*�S3������k��^^r�J�e6��LR�eW�Y���:�`V{����A�M:�;���D<�p���@w�`��[�e�\�m]�yq���>����t1�����F��jt[�q��7�u��E�(�k����i���@.���d�4�_�[�Q8�,�@T��	�>h:������~K/]�j�]�;��w/Z��X�D����f�X����d���b�d�{<3N&�x
�^�y�;[���%�h�>����0T&a+IRY���LW��O�L��m�.����9����e��osVc�����������i��`r]���+�jJ�z���V�v��f���R�eq��Y�]*$������~L������z8��V��/F�/�p�6��2�n[]�� Y�s�d�*�EQ�l������c�dMo���@�fK���Z���	���Gs/O������FrqHB:~�K�����Q���N������
��o������l~�uKq���"���%��h���hT��z�~�Is��
��ffB2"���u�H��'������re��/�!��x��5a�����;yG)��|n�j����0��c
d�������H{��*LL�<������|��bv�B����������^�,dEV
�bk 
�����FLM�����t�^��py����V��N�nv������
C���&~{�lu���=����?�;��7nC|����a�w��g��0���M2��vS���a
���"Bt��h������m��#����p��H�P��dD�B
�O��k��@�b4�����^/����X��*���Z�n���O�~'��*�G�7���.N��� ����������A#�p�� ����h5���M����4v��B;�N	�����j��v��`:W]W��%�CC�:�|�3���6/5!h��Y���I?��!��0��/�u
(�
$���-B(��d�K�*<q���F��b|*�y�`����P��4�&��a]��W�.��f�[�����e�?o0z����"������l'�k�r��� Q|�3"��f�������c��"aX����{
�lK���R�0X���F��B��$=Ck,�v~t������6�d�Y�;!�k�
�4���b��=��JOIUK�s��Bp�N�I�<�'���G+����/� >}�_���Y!�Y�"t7��iJ7B�`4�r�8���D�{{I������MH�Y����/���

��zc_��|���cx�"n��9
�@?�`�X����I�f����7�$%���
B��t�����"��T��*�� &169�������v�P���y���F�1��� N�Q�%{2U�C~[4��[8���}�.��M���|�[s�"�
1b� �!���$��p�P���4Y|�����wg�/�!�A���I�m�&
h�0�w�U�8D���roYoc�\P�2c	���~#R��K ��~���7�k�E����7��K%�/���,I��w��9
�_��6�9���g@Y���c��(B�LV`v��oL�����M�D�����W
v)1:L����"�B�=�!:#"
F'R�
��!��D~�k��E�x�`M�Nk���:�O����6�9�]�F$���s���S�@ng�����p�#"����)>$�f���r�9O=����>���9�O��P���H:�����&������]7�c��}
>��F6�Y��C�
���"e�k�a�1�L<o0���6@TC�
�����p����.0��1k&����|��I/��]x��#��ia4l>#�.F���3H��E
�|�'!Y$^���$L�Q��$�"����6M}c����N��������g��S��I�;���~���|�$�)�I��*�����Si�4]�F���A�J:����%�
�x�l����j@aJ6x�^NkEN��>��X�ql��b���#�A?�����>�1�����k��7�Md��8���g���
�p�r���Lqq��s���6�m��Q��Z9���u�i����N�0��fFZR)�HQ\��)�V`�6zq��L5�
 �8��x�"� A\	00'l���~��$��x+A}����Q%�s:���
�����
�0�:��e��
��8������u'��lq�f|����7k� ��5������y�a�+��!K����F���������|��"��]��Z3�	�<bc�9Z�+�J�U�����������	���Y|�W���-��>�p[�N�4C�����0�%�"�q.@������YB.�����[���l���@���+�K�:�%�"��.�j���I
����HE� ��S��l"�$w��@]�qk��|k2��
�<e��Ii�7��@�H1�w�J�����x�6Cqw���������:�a����O�5��5�o��a� M���e{86� �dh���=����Bc>���W���%qH4r@@W)�B���
<F�L�V�����1�I�`����t�7�j��
�K!��,U�!�������K�V5����0��M8�4��Z����!�����O��	�yGT�'O���o:%�E
XFX�.���u������ �&�6k���gL�^Rk����vN
O��G��������D�[�AS�Z��������w�N���2@�D��Z�/jf��O���-�v� ,���2'��?Pgwr�A��p���f������w�2ne1�FA���i�V5�n4Y[���QQ�d�l��/�r�-��3��Aq��(����Mt��2���x���xi?����p�6��8�I��#�Z����.���U��&yo����j�%7���iz�e��O���u��\�0�����R������`z��<�,�R	���}tF�+U7���lO
�����V�e�$T�I�qW�jWt��R��m	�W��/�`�
��_1d]\%1�B�������T
��T���t����}^����zgwg���a�R)n�R��i��p[H���^%Y>9����mQ|�TQs-b��&����%CnW���Y�*f�<��B���i�b����y�b�"0
�lB����m��
��feL��]\`�M�fVL�@�����x�����K17��k����nV)l�����R���)e�E���Z8[8�����K]��z�^�;��P���n�7�5j~`�R���:��[���l�~b!���B��Pq+���0h?K��4j����9�6�crB�&���-N*�Pd�� 0��#d����`�Y�g!b��H�zj�w��,[r�m�z]�l�������&�;�6�������<C�!�O��bm�{����`H��dQ���f�bP��$�-�hObS��[�hIf[����K��2$��n;�J�'����#����P��D5���bAb��
���Ft�#E�Fb���q���n�����,Fl&�e0�%b{�Gw7�
��EdO��"O_m*�8���G����6��K�>���,d�<8(���s�w�P��e���j���Z��}g��I�y��i�x��Vyq�1Gb�"�C=n]P<�z�|��W��\����?��o���e
�����f3�n�sfw�p���K��8��W�e�c��=�'������Hs��mqL#���n�c���F��86�^���IY�����68l{��6�����3e�L$s	��-�KTO+�PD/ ���������6��Mn,c������()w'E�(��r����6z��[�z'��6�����J�L���A��m����s�o���*�������3�6�OV����B8�j���
GS�4>�;��j�+?���z_�}���
�A���f��Fg��Z���z����kG��n����z��VE����_��v���d���b�/���FM��L��a��%i���~�N�J6����S��DU�(FV���J��)*z�i7��[ u�c�Sx��l����|G�6����������3
f�*95pp}x����Q!���#3��|g\438ZV{��tv �F��D�h��IG���r\�oE�5���9+���4���
UEE�dQ
�Y���Q��jz���Wc��GZ�5�D~����v��r"��
��ly���,a�����:�R@�v/�t���{���9U��/T5�Ma��8���j�������Ko�a�{��?B����'a� ;�]��t��t
m���(�~S��9BM�Zq�'�����x��o��Go���Y:~�� �	�L���og��pm��%��l����H���7�^a�!Ys�S���]�<���s�U��i�d��eL���`9]���	��g*��f�k��z�k�;{��i4HM�*�:����x:����{y,K��Qr[p�:\�e�X�y��7��A��o�b��,4�
���ZY����N4��~0�5~g�q^�$P�]��nw�$�s�B8���
M�O`h6&21\�"~�
�g�4�>�D�3J���Q����{l��m`r5.�4�� ��v������Q�=r��n!k�ep��}��h���^x�L1����	�q�v������k���,��l}�5I*��jA�W�*^�D=k���0
���*<U?����Q���V�n6C�Y�w���k�����@�RE���T>x�rj��PB��5^���
�IZ���:�xvrq<x��I�������2��O?�M��*vt������wo���!S�3�HS�'F����5~�gS%{����D�YJ$�[�9K$Q�L�q	��-(m�<.|3�K��t���Brg�ilCj��,�6���
�{�����Z�����e������������,�?Y���w�����x�x�2�X�>n�G@p><�f��Q�*��5i�x"^\����J��XUu�TF��I��� z�������8��_��mF���a�a�`�RA�
}�/`j�����D��4���Y�*'�S�*����l��E�������qk�w��y;B��O��f2���!��&���aH��%�����v�/L��q���\rf�����/�e]�M��8��s(f�������>(Tt�_�� ��d�"y���R����&�����^#;7���cdg��/d����������Q+:���BoIMGv��4�C�(��s���Ss����r2�4_���!�a�m�-�x��fW<@�OL��wl|�vBW�8�)�4�o�i��s�r��h=�REg�L�\l��F{(�0��l�@*!N�Ml�������@}F<�dT�Saf����D�D "y�������/��T7��w��	"�)��R>��x��Nf������5^����+��W����_kPI��<@[V"w��4�j�?�1����|['�D�EQd&����0R)�`w`l��u%=�zWJk��� ��e�cb���VH�P�-3R�E�����p��F�kX��5*PD�O"=�b��C%��6��C��-��2��O�J�0�I�6��n�fM�{>�^!��S��m8*��H�3A�5\!�TGA�!�Z�j���!��>9=�������a��i�ip�����St�'ZE3�*���|����}�����/��h�@��T��!����w�����a"*�`�;">���i��{o>�.~	��1������?c�;��a:E	�Q�$N9!�P��
j���Y$w_8x�,<�����,��g��Ld��d��N���V�x������r����<�'�mf���b�8�s���&�Y�b�K�:�c�s���q�P���d&@��\�8f����S�X�*�����������7��zw2-7�t������t�G�f���0�J�N��[J�d����#,�e�U��bf���Y�L���nJ�:���t��T�a�d�.
��$"9\�e-�|�V3U>9s���"cMsL��E�.���I�+vW1��S�|n[�����I�U�W�yD�^a:=��5$iWS����Qk�Z�}��])vF0Z��F
��9��W�n���=��4��eZ��S��.;�=�L���1:��d87}C�v��V���M\��0����y��.�6g[���d�|��������f���;����w;�-����|���������z�����-�/���n���?��v�+��s��s�'6���������%dV�e(�Qon�������P?]�x�d�R���������@�E��b>$���M[�������Y�������������e|E&�5
#R�U0�0O�
�3)��f�^�����t$H!��t|�5�M�x@���%V�@��z��ET���h�V��$R��p����d?�q���Y�=��H���+J��GFI[��| ��J��a��:��Iz�����P����Q����N�A6�f�Y�(R�$�l?@�x�C���0k"8����\	�,�hFkq�E�J_Sr�z��@���q)�2�$J<]1u1�S��s��m_��&=�J�Z	��c��(r�������)��&��[����p�*r�h(�>6Q�����s��B�Mj�7h���:-�f���a����C�aA�?u	
��:����}��R��������h:�V�xO����k(4��f�!�<��7"��j��1�rO�� >�����Yp��Qq-X��9p����%z/QmsM
*�"_N�i�3a@E[
�;
a�gT��]��	W!����������������dr|t�����>}x���
�kGh(~]���b��KG����Y���:�����������^in����r-����@uq4��5d\�a�J�q������:|o�*D��)��}	_!./5;��!��872�������ss��n�P��{������T���+<���j�e+D��fcv�SCx���_�����tm5���
���y��o��fhF'�#R1���}XG�:�"d�������*��u�#4�������������Z�*�����]������|�
u��*w[��{_|�h�:��U-�l/�8#U�#m��Q�3K��a�z�u�����kI<�Up7"��<�^���
BG�[]��
��=���	����pJ4���^�[��_��t�w=����xSl��A�LRy�,��P(��E���t1W���'a=?W�A*�O���&����F��1�f']<��/1�Lw���cc$�%m���ijh�#��H_�f2D���0��d}�����y��(�[�S4�v��������o��	��|Dy������l~w�O�NY<.?��K�H�sH�g��ZJ3�o��4F����*]���Dk���^W����>,��au
M[f���U����wO�S\�l�_q�������H/O^��K�#�2V�����6`���z�k������-���oj�iI�1�����{���'��+R�Q�i 5��\U���-�G
�c��q�w-�f���tB���2�p3W^���y(�����g�7������7�� ��b���s����W����'�?����%��h�9?��07s0�o��~��/a� ��0:=;y{t����:>;>}q|��8Y��N"b�������F ��z�a�jo%TV���*����;MWZ�����Z�V/����LO��H��,�RH�W`#�����-<|'��,����^L\�B�(6�K���Cwy�=���_��*F=k;�5�W�>��qU��Y�G��_�!�H���=8$*�n:�#����F��b���I��Y3��5}U�i<M�C�v�-��Si�j
�7R�L'����E3�#�^0M�+|�a�?J���
��b����^^���*���X��v���Ow��S���g� �0
�Nw ���O�K�pKIv�^��y�z��o6����+��������6y����n�&��N1���)ECG����(�}��'�ON%�K����S������a4Z�d��s���O�:������<9�89}q���")T{���X������sXy����1��5?E�%�b
C�����,f�XR��}�k@[�\Tx��3C#\�!5km�?�7s'K��]�����D�����=`�����w��f���yFB��!���x�����\�:eStM.��r����j�z������">
���8[�/)��@~���A�b�i�9����V���u��$�"j@�����XxW��I�!���F$������?��M�^o
[��p�
7��P�(B�*i�:�,�0lx5����w�!A,�ux4��p�0(:u�E�,���#;��K��?�Z=\���m��A-56`,7��DTb��h����+�&BA���6��Uv���=:�������XmT���I����{t���1tLM9f%uK7G��`������XQT���+.)8�f�Z&�v��J������@����fcg�~9�%����#��_���0�����T~�/?�����*>�����_�_�"��T��,V�����*Jz+V+�nX�s��%j �q���u*[�l_6�:(b�O����	����)+�Cw�.
3,\Tb)j�z-X�p�y�W��b�-T�$�~���w����w'/�1*�����
�icV��J��a�J�Z�f(��]�����tL(�'���PF�2�?�r<�qe(���|�,��7p|��c��|�?����<���A����������n��a�%�����)������(S�a�~�z	�C^
4F���B��?��e���p6�qa�uQ�n�S�i����Wa�U86+��>�VW��x�^�4
�c*�1�~�/�'-��x;h'"������W�T)��1^�}����'.�\�J�_aW7��v�����` r'Uh�h%?�e8�^��6
��q�������u����S������N#������_��
��9jpEe^���`{T|�m*~�;�?=L�hm���G�&���_��:��2�J��5�`�U�}��4�������"@]���A��l���(Nj61�1�?4fb��k�4�C�m<�@�u1�fO��B�z����(Rj>���F���w��������_�g``V�Qw���}�E��_4�����E��F���
�p���Q��e�����a������Y�<�:�ky�&�8���*g�"CC5�h_>�vQEN�'x������s�n��!����2����VJ�m�*oNN��w(�3�z������LOr��Ek�N	�N
;A1nh�1���g�2����_d/����`%Q��
��@�Af���c�� 
k�pv����r�eMlvN�qF��������a|�m���"F:��:��t)u��f���0���-������QE���B��|��������U0��<B�9Q.������Y
���?�M���j�����_��5�y��8����M��s�~f��7"���;9�~o������Z��������#��.��k��T6�����O�����l}���%>��;�
����I���F����4��7��Z��3�[�a��,�����������L�:l610�3@h�6n��)ZP).��5Q�T:�����e�YA:A���_
VaicBob�f�j0�2j�4����(���r� �q��z����F�
���4n�X%u6�&�m=`��N�*�(w.��/&%
C�u"��?i����[��G�a����2)Y��W��|�����nY���Tu0�L^�:���BC�P�.,2DU�fGU����Z���}��2HW��0 2#�����$��q��K��wg��V��2~���U/����9��m5���T���c�����'���^
&k=����UGW 
�����-�e�KyH)��h�^��0txD��_X��5���A�����L����/��S�#+���!�����Nz�e"#di���kf��G�hQ���c����������������c��t��{����+v��1k9�M$�\��(�#���ja�J�����N�Z�������c �5��t����o$v~���o�m��hvE2;��@t��o#���\��P�S�4��Z�si�d:��l �D'�Q"��x�g#����g:9��L�aG{eU��
�X��`y��|} M6�:��(5G/�����'�]�E�]��;��_���a����I�=�z�"n���<wKY��n7t��� ��d#Q�&���^>�W/�"bA�~��35����fa����4��+Bw]���R� 6����8:,���
�����@m��5;g���)9P	=����sK�M
@80�(��`N@D{�fn��UD�-���&y%��J��eJt+�=GTn��x(�����u&Ay'��9�j�H��y6K��<�q]{P�<��R�B��.{0dj��Q��t�1#�se�$E��n*����5��xB��S-��iF��u8�u��f1�4���[�j/����RCL���,����AK�s�Ve�TM9����B���B�-q[����4���P���A��ZX%�>
�>���R���*��1�w:��:�2j����,�����h"$O��CS�&���VAI��������a0�t��R�X)���Z�$�d���=77���L����k�b!�
����.(���{=���h^�+�\F��U���%��
m�q��VN�^�iT�D��"����uR��1��U�T�j_�*q����{��q�{��\-kz�I.����rM�F�0`�F��{�p6q(���w�a��r�i��������x����b�'Z�
��(�lD�<L�_
�b�mG���ItX�m��9F�����=�W��!,�"J9���d��c�M?J�i�������mOJ�10F���l)R5J���`T�@�s�xJ��l�����FcP���D&�K�W�]�����dc�y���Y���`��F�/��m��������=����+��nY��z�T���������k|�B�$�����6~������,l��l�z����R�MB	���e	�EpN������$�0��>�G��6[��k�������U���G���X���TY��.H�N���{���T�-�mz^�ON��j���g_|�c4Lz~���!y��`������QH�(����B�����V�_��Is��S0E�j�*��g�%ES���$3%���:2l�-������G�d��}����'��j��s�}fE�&��-�C10
E�7���H��n����!����*��)��VWY!�\z'*�Z`\����H�"E�i3��|��![fl�Ct�N���B�#���i�����".�z:
�Ar���\����u������{�&��~~�C.>R��Jd���������eF��[N���� -��i�(D��z��_K��l���m����5��;v\.J@I�/I�A7^�	���47,Wr0�L�,�M#�gD;x'���5&22�J9������)������L�]y���{�0����h�T���������j�����k�b>�vh�9]`���F��-��s�8
����Q�M�A}�*}�Y4F%8H�@J�N�[M��<�y@�y�w>�c� �������������.���M�p"H�c	����K��EH�w�zm�:��J&��\�b���0n[�$h�\������s
&�5����iR>A�'��jr������0Y�O
wZ�c������fj�Qt7qo~io{{oi�
�r�Y�X�t��fnvG��rg�LX�~�>�f3�N*i��5��U����O2~�����
��Yx/U�0NM�@�X�A�FU�3}�
��E.�n��\n����'��������7��7'�H�Sd�xk�F���uYvo�A���RHP�S}r>xw���lpv����!��
�kFQVK7��j��~J�:,������ +����4XM�t���p� ��6��������H�
�`�
~���Z�}9���Kd$�������6 ���F���m��)g�W���
�������b������N?�A��;���+2Sk�lw�����3���V�vs
 ��.��c����crE|2�������0��Vh��nh3$4E]\��C�.P��{N�Q2C�N�fu�g�wwp~Fh��?���t��z����������D��ov�M��(�f�@��sc;O#�(���-��i��d|�����?W���8Ec[��q�������b�X��rV��C�kd���2�O��V�}�������.����(As�t ��q�$"���$""�D�f��]`p��R�0%���p��lf���o2&�t�f��)���{F�H�����K�q�y�8Y���90vK���Z@�T�*^.4���o��eDg��V��s���29�	�H�uN)N�[�i�5�Z�0��i���PO1�	>�|1�aW��2z��G2yl��Od��a�H�������f���������^-V��r�^^!j����<Jk���Ya�/�j>N�U�d" �y����J�d���3�9�c3�)��5PX��\�^���6�����~��g"u�����)�;H����7G'�v�D��m�Z��zi+��f�)L����������{�6C��zp:�S�3G&J��BY�J�����p��%
*����<z�������9�m|a�8��:��D
Pv$(��4f9k+^F�L�J#�L ����k5zw��@�����0��$V���$Zvy�m�3�V����� ^���+�~�0I��&]��o������7�34�3����}�>��dj�^��>~�>N����?F
 �}/x0m�
G�������� \��Q�����==��3�u'Opa{�SC;{�'\ve���<�
op����*�n���N-m�
���R��8�)5(��|7�,�*�M�1'�*^!��ez�O��/U��#&��N� �r'������%|%	�2Q�}/��9}:^��h�b��_YW,�D�?�������Y���c'{��e��E�8k�R�`J�+r�������m<����N����~�+x��p"U+u�rA��\�
. !�|qc�M+'	3;�R�>	Z�T����i"�o�Es*�2�z��3+�a3e^A�OM
�b&]���V�Y��P�F�U�Q0mf��}���FY�Czc��s�~�y�e�
�
�\-y�����u>�S9X����pq
J���
���a$HY���S*��l$�U	�8�p")��1��6�z'3�11��<��[���5��J���?������'�p6��' ����k�.FN�J�J>��WR��D�f���S�Z���[�����6/;��Lg�3�(��o�����(��6��T)c%����,i�5d�c����<>��H�&
�@�q6[����'"��z1�J��\�g��y�0��=�&���I1����wX��;h��T�~'$�g������N��o���2
3������4�2����|��p����_|��v���5�;h�qZ*>��g6AZ�h��U�x���l=]�G�:B"�-]����e����Yd�E��Q�*`�o����H�$~���5�b�c7Z}��
1`]�>���asA��D����������(J�����"����y?	I�h� ���d�y�Gv����>�T��$hvj8��A�>�����W�F1B
��^M�r�&&�n�wr>x������!]��}�d�:X�4 _8�k��+�7�����R��&�������hV���Bt\��|<P���+g����5�z�Vo����Z�F������<@�U�I��x�i�1Q%eC+D?.��9��h����������R�_��������g �\����t��m�R`v+��u�.:O��R���#�*����2i`:�\�	�)��x1[����zZ�������N�'BM}�`k�2]e����g��p���_�S� (��/i�eX'����~U6��D8���
$%���Y�9�H��5����3!�8h�K}���c�S��'��=���t�*�L0FXJ{khz���(�^�b��]U4��^&�`8r�������2��V5*W�g���l2]��F�n���l���\b���d�K�P�)?&n��:X�����"m���lOYYG'�'{8PY��-g2=�����������ej^���70��t��7�	�C��=�������j�X�;v���w��:�MtT����g�����e��3(Rp@q��X�id1�i��5��'��k.�����4���PB��U����=}��Dz�Mh�����_n%4����|�o����������<�V7=���fU,����MM
ZK��*���Q �T����y+��$�{
1w���(��S3eJpF��f���%���mB��Oi�5��M�W�%���s#6u��ddQ�"��A*R;]�j�g�:){x� ��F�S�e�����U���:%K���R�d(w��84��v����jE���Z�J>���� ]n��x���*������i
�VI�Y�b���.i�)-t�j��@Q���"zJm���y������t'
�~��y��'�7�s���M���r���Le�a����9k��1��
p6����S�XD����1_p�(yuF�4�T��s���M1X)�v@-Vb���{�c4���_tO�	8��u�9Z���	�}k��.z�b��; s���'�aR���H'�q����2&���
�<����H�b�Sj,�sLy�b.�e�ej��q���@���"��D(��m: 5�!�KE����N����J�m��,����?t�T�n$R���W���M��4��2����������nx�	*E���6>a�����<!>����J�/�7�LBm�r�Bd�3�|�K-�T
������U���RS)�0�5����xDGr5��$�IG��rMi��	����9
NF2���q"Z`/Y���7��N��P�U5�4pS������4�q�\�Vc�>��r�N�������z}�������d��j_+�;D�Tqy�3���6 A��I�V��m9����:=�g����&f
�@`��Y�n���+�E�b���G�,���z��b74�kZF���l����$��\�l��q��?�����F�	j/�X��r���O%%UZ$�%-:�����3�fRS�R�x+����������_p >vu(�"��M^�3�Hq+SM���Nl������w��}^��d�MB]I�^������Z<i-�U�-���W��HI��rj�{���f�m�wrz�����/.��GJ!{/���t�PG)a�.+S��d����2�
�;B5�b�u��j_r�J_����Q1b��%�������E��7���;��>c����rj�t3N#�=��L�8�x��]�Yq;Zd���S1�;�zo�u�G<�B ���T�
V��p��m�n�p}KM1����U��$+*�q{�\�0BU�([-u)�eD��U�Zx�����O����q�c���
Q�A���r|�je�X��5I���P%�F�GF�.��)�|�C�^�o��.?4-
L��d������#�2!���	�����;p�t�������1��;���o����<��+L&�v�T5�0�r��d�SS���]��d�D��i&�i�����������sT��;2}h3-����W\Z+�����s����)<e1�,e��=�!\�6��#JK?�j���pC�l+���x�#2���[n��8BO|�$`l0�O�H���^�
�0M��$��(4�mFK��'������'
�2KI7a+#����Z����T$2ci�&���J�DL�%���r'�X\��M���������a�*q�]`�
��V�����=�w!22E�T���P�������Z,�j1�a���p��}

���7��6�W{�z�RiS�CE�S��^,J�yMKy�G������	�F��6-�������	M�Y|P� ���4��
�)�a>�MZ5]q����Y|�1��1g8wL� ��Zj	g�E��PrJQH�>��Seg��$�,�m�ZT����R����B_u!���M�t�x��hj�1"?�" L�H�TxI#4��\�o�2��/eh�o����/���`�^��v��K�'��T~j��k����f����1���Hx�7�;
�
��!w9�����2��m�����|��'�W��<�5�w��,��=tN����Dr���s���0�kw������X�F!��t�����������d;�jo!�4[��{Ql�P���-��-�<�~0R}J-%�LP
nkQ���%x�yl��U4��io�8v� ,Y}S@TH3a�DS�����)z4�[�����S]��h���b����1H�w����{3�,b1�1yd��#��7i<��r��L��R�LU�9Zf�zI�EQ��a|�E�I�k����b0��1	5���9;^0<��������[�L�V�rwu[��#�x]:�;����'
w�:��������l�<O��\�u*�h�w����=��}��3�x����K�5��E��*��z�@�������[X�	��	i�
�2���8�)E�QY+��P"�����]j��e(�-���F��N4����|!r��R�h�������"��#�g�������'������z�{��#�ifq���"������E��d^x�!�r�	c)�=�2����?�iT5��?�����������(����%�,����U��v�K!���^�k�X`R�}��+a�%������I�4���d��{J#��W�������E9�c]�Hf{�
�*�3��fkt�e�l7����0��������;�$���k;�Q�9��w��B�h�����b��!�~���c�R�g�������R�"*�>�P���+��i�Q!��V
������
j�?.f�z����9lGc,�k��qp�i�!U�����k1��KeeTe)E���;�P��s�<��U~����2e��m���>rZ��c���A�KkE�l�P���5<�0R�#��E������l�������|I����6�M��b��V�$��kTb��i�����/�T^H	y?-C��	"
z@G��.�v����7tV*y�o�k�m�!A�\!��m���d��f:�	{L?K�����7\l�ce�N-Qg8�j��&�Cf�������W!��j�#��:���Q��M��p���0l����x���p���	�.C��W�@P�2�*^���*E4�l�����
$��"[A$���V<!5 �b>���4��U��z��_&1���p��U�n:��$���n�
��	S����[�z�����rIrT��.-���I������H����G���q�A�	(eT	�4~��
'�z����N��j#mt)�EYma�P��b���a�z`j���trqs��3��K���S��;X�'������kc��i|stz
l�	��s���%��|} O��N,��E$z�a�
�N����4�6�an1�`MJ�B�D�>��e8?__�K��*��]	��e�Of����JW��W��3��2O�>��?oB,�pu��_0��W���u�T�����c_G��$�>��F�Y;�k��	k�?����7zt��Zkt:-�������?�-������v����j4M���h|	lP�!���'?��r���A?������~s6���;�G�N��m�Za'7��Q���'��z.���F���/|X�6s(��`$^\��B���>�?��Z�Z�Q���hq��D'<�6���������=���:l���h7�����C�_�Q�x-���S��L�D���E�$~���^*e���!�H��������q�Y�����	'�f���mH�%�=��dt��	���dCXj�2��B������(#�HG�b���zC^I]�����q%���@��� +��bD�Tty=;-���e���hy���)����p���������/��@D���z�.7#��Q
�mgwl�R�����YgW�L�dw�fq���::M]m
���zt��\��m�q�Du�?���; �����U�����h�X��W�K<��rm�d37��5�|���N�^��X��A'��6�231��9�\��9�0�'q�G���0��(�MEb��G:F�#U9�n����r�E�Y�$���rE�u$F=���UZ�n�#c����b:~�0I�y����H��
'�`��=�������w0�����&�<�
����w��=G���w�>��
�*Q����H�u�*��s���Y)���w���A�3��g�x4�b��8�sd�����>OZR��D�(��8*n/��K�����jSI��5$�fR*���E�7�\E�D��8�/����Jv���a���M�����q0{��L`�1�,���[q�nE�t�F��]�����s�[�H*�m�L���,��[���k5;�j���~~����*����U�u~|}|v�a*�3������|-R��d������_-
��l���%���m�m}~��}D[
Q|�X)����*I�dl�$3
������LZ�T�2�$wI�P���r��D��f~�3�3�cW�7��#R��k�y"c������{���a���m��CIe�HE�r��>�d$���!e4�^��x�ci&�V������.���c����t@������W�-�6�U$�N�`Rw��.n���7��$!�.<����]H���c��������������E��D���;���N}w�����=/��=��f&N��)���uz�z���y�I�P��DK���e)_�O4�����Z�2��)�7���{���y
��
h6�Q"���-�g�W3.�.���h����WG�\��]�\��;�xr~�P���/��^\�;{����K�LH�1���r�� �;�z����~��^�T�UN���{}\\�b�H^��93�����	O���������j�	���&Sg��Iz�����n�*���b��������*�����o���'�?�_�~'-s.1��/��|�X�)�3�$#����s+444_�?������8fW�0�����]�?/r����_�UYRj�[$)a�@��	z�h��?�G
��z��D��T��la�����Z�sm�}e����L�H��-W���x�cNw�h{���N�c�2�v�	��u�@*�qm1��u.����.������./�A���a��f�

���s�CF�[������ncX���.�y�h=�x�x��� 2�*E��s���u��HVm���R������{�,�#Q���.�(�:�_���]=���z}���}o\��R6�we)��5�>g������x�a�����`;a�@`4�v�(l��7o���6���7�^NA 
��Vab�pl~Ul
�*�Z��^��x�W����������X����{=���as,��������CQ���{�$�_�����~k0*�����s
��d~�m����J�+X0|(���sb79��q���|8�hN�Ow�������:�&�	n�~'l{����M���#N�����JO����(P��j��\x�����A�����=�y���cT[(dBv�OD�R��Yb���	�j��s��%���K�Z�=V����]B��T�Z�k
��R����O�?X����
b�c��.�0u�c�73���*�	ZD���V0�Am������S��c(M���)#@|����>��/��9����������������/�

#14

Rajkumar Raghuwanshi

rajkumar.raghuwanshi@enterprisedb.com

over 8 years ago

In reply to: Jeevan Chalke (#13)

Re: Partition-wise aggregation/grouping

On Fri, Sep 8, 2017 at 5:47 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

Here are the new patch-set re-based on HEAD (f0a0c17) and
latest partition-wise join (v29) patches.

Hi Jeevan,

I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not picking
full partition-wise-aggregate.

SET enable_partition_wise_agg TO true;
SET partition_wise_agg_cost_factor TO 0.5;
SET enable_partition_wise_join TO true;
SET max_parallel_workers_per_gather TO 0;

CREATE TABLE pagg_tab (a int, b int, c int) PARTITION BY RANGE(a);
CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10);
CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20);
CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30);
INSERT INTO pagg_tab SELECT i % 30, i % 30, i % 50 FROM generate_series(0,
299) i;
ANALYZE pagg_tab;

postgres=# explain (verbose, costs off) select a,b,count(*) from pagg_tab
group by a,b order by 1,2;
QUERY PLAN
--------------------------------------------------------------
Sort
Output: pagg_tab_p1.a, pagg_tab_p1.b, (count(*))
Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
-> Append
-> HashAggregate
Output: pagg_tab_p1.a, pagg_tab_p1.b, count(*)
Group Key: pagg_tab_p1.a, pagg_tab_p1.b
-> Seq Scan on public.pagg_tab_p1
Output: pagg_tab_p1.a, pagg_tab_p1.b
-> HashAggregate
Output: pagg_tab_p2.a, pagg_tab_p2.b, count(*)
Group Key: pagg_tab_p2.a, pagg_tab_p2.b
-> Seq Scan on public.pagg_tab_p2
Output: pagg_tab_p2.a, pagg_tab_p2.b
-> HashAggregate
Output: pagg_tab_p3.a, pagg_tab_p3.b, count(*)
Group Key: pagg_tab_p3.a, pagg_tab_p3.b
-> Seq Scan on public.pagg_tab_p3
Output: pagg_tab_p3.a, pagg_tab_p3.b
(19 rows)

-- changing target list order
-- picking partial partition-wise aggregation path
postgres=# explain (verbose, costs off) select b,a,count(*) from pagg_tab
group by a,b order by 1,2;
QUERY PLAN
----------------------------------------------------------------------------
Finalize GroupAggregate
Output: pagg_tab_p1.b, pagg_tab_p1.a, count(*)
Group Key: pagg_tab_p1.b, pagg_tab_p1.a
-> Sort
Output: pagg_tab_p1.b, pagg_tab_p1.a, (PARTIAL count(*))
Sort Key: pagg_tab_p1.b, pagg_tab_p1.a
-> Append
-> Partial HashAggregate
Output: pagg_tab_p1.b, pagg_tab_p1.a, PARTIAL count(*)
Group Key: pagg_tab_p1.b, pagg_tab_p1.a
-> Seq Scan on public.pagg_tab_p1
Output: pagg_tab_p1.b, pagg_tab_p1.a
-> Partial HashAggregate
Output: pagg_tab_p2.b, pagg_tab_p2.a, PARTIAL count(*)
Group Key: pagg_tab_p2.b, pagg_tab_p2.a
-> Seq Scan on public.pagg_tab_p2
Output: pagg_tab_p2.b, pagg_tab_p2.a
-> Partial HashAggregate
Output: pagg_tab_p3.b, pagg_tab_p3.a, PARTIAL count(*)
Group Key: pagg_tab_p3.b, pagg_tab_p3.a
-> Seq Scan on public.pagg_tab_p3
Output: pagg_tab_p3.b, pagg_tab_p3.a
(22 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

#15

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Rajkumar Raghuwanshi (#14)

Re: Partition-wise aggregation/grouping

On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

On Fri, Sep 8, 2017 at 5:47 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.
com> wrote:

Here are the new patch-set re-based on HEAD (f0a0c17) and
latest partition-wise join (v29) patches.

Hi Jeevan,

I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.

Thanks Rajkumar for reporting this.

I am looking into this issue and will post updated patch with the fix.

Thanks & Regards,

Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#16

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Jeevan Chalke (#15)

1 attachment(s)

Re: Partition-wise aggregation/grouping

On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Jeevan,

I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.

Thanks Rajkumar for reporting this.

I am looking into this issue and will post updated patch with the fix.

Logic for checking whether partition keys lead group by keys needs to be
updated here. The group by expressions can appear in any order without
affecting the final result. And thus, the need for partition keys should
be leading the group by keys to have full aggregation is not mandatory.
Instead we must ensure that the partition keys are part of the group by
keys to compute full aggregation on a partition.

Attached, revised patch-set with above fix.

Also, in test-cases, I have removed DROP/ANALYZE commands on child
relations and also removed VERBOSE from the EXPLAIN.

Notes:
HEAD: 8edacab209957520423770851351ab4013cb0167
Partition-wise Join patch-set version: v32

Thanks

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#17

Rajkumar Raghuwanshi

rajkumar.raghuwanshi@enterprisedb.com

over 8 years ago

In reply to: Jeevan Chalke (#16)

Re: Partition-wise aggregation/grouping

On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Jeevan,

I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.

Thanks Rajkumar for reporting this.

I am looking into this issue and will post updated patch with the fix.

Logic for checking whether partition keys lead group by keys needs to be
updated here. The group by expressions can appear in any order without
affecting the final result. And thus, the need for partition keys should
be leading the group by keys to have full aggregation is not mandatory.
Instead we must ensure that the partition keys are part of the group by
keys to compute full aggregation on a partition.

Attached, revised patch-set with above fix.

Also, in test-cases, I have removed DROP/ANALYZE commands on child
relations and also removed VERBOSE from the EXPLAIN.

Notes:
HEAD: 8edacab209957520423770851351ab4013cb0167
Partition-wise Join patch-set version: v32

Thanks for the patch. I have tested it and issue is fixed now.

#18

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 8 years ago

In reply to: Jeevan Chalke (#16)

Re: Partition-wise aggregation/grouping

Hi Jeevan,
I have started reviewing these patches.

0001 looks fine. There might be some changes that will be needed, but
those will be clear when I review the patch that uses this
refactoring.

0002
+ *
+ * If targetlist is provided, we use it else use targetlist from the root.
  */
 static double
 get_number_of_groups(PlannerInfo *root,
                     double path_rows,
-                    grouping_sets_data *gd)
+                    grouping_sets_data *gd,
+                    List *tlist)
 {
    Query      *parse = root->parse;
    double      dNumGroups;
+   List       *targetList = (tlist == NIL) ? parse->targetList : tlist;

May be we should just pass targetlist always. Instead of passing NIL,
pass parse->targetList directly. That would save us one conditional
assignment. May be passing NIL is required for the patches that use
this refactoring, but that's not clear as is in this patch.

0003
In the documenation of enable_partition_wise_aggregate, we should
probably explain why the default is off or like partition_wise_join
GUC, explain the consequences of turning it off. I doubt if we could
accept something like partition_wise_agg_cost_factor looks. But we can
discuss this at a later stage. Mean time it may be worthwhile to fix
the reason why we would require this GUC. If the regular aggregation
has cost lesser than partition-wise aggregation in most of the cases,
then probably we need to fix the cost model.

I will continue reviewing rest of the patches.

On Mon, Sep 18, 2017 at 12:37 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

On Tue, Sep 12, 2017 at 6:21 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

On Tue, Sep 12, 2017 at 3:24 PM, Rajkumar Raghuwanshi
<rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Jeevan,

I have started testing partition-wise-aggregate and got one observation,
please take a look.
with the v2 patch, here if I change target list order, query is not
picking full partition-wise-aggregate.

Thanks Rajkumar for reporting this.

I am looking into this issue and will post updated patch with the fix.

Logic for checking whether partition keys lead group by keys needs to be
updated here. The group by expressions can appear in any order without
affecting the final result. And thus, the need for partition keys should
be leading the group by keys to have full aggregation is not mandatory.
Instead we must ensure that the partition keys are part of the group by
keys to compute full aggregation on a partition.

Attached, revised patch-set with above fix.

Also, in test-cases, I have removed DROP/ANALYZE commands on child
relations and also removed VERBOSE from the EXPLAIN.

Notes:
HEAD: 8edacab209957520423770851351ab4013cb0167
Partition-wise Join patch-set version: v32

Thanks

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Ashutosh Bapat (#18)

1 attachment(s)

Re: Partition-wise aggregation/grouping

Thanks Ashutosh for reviewing.

Attached new patch-set with following changes:

1. Removed earlier 0007 and 0008 patches which were PoC for supporting
partial aggregation over fdw. I removed them as it will be a different
issue altogether and hence I will tackle them separately once this is
done.

This patch-set now includes support for parallel plans within partitions.

Notes:
HEAD: 59597e6
Partition-wise Join Version: 34

(First six patches 0001 - 0006, remains the same functionality-wise)
0007 - Refactors partial grouping paths creation into the separate function.
0008 - Enables parallelism within the partition-wise aggregation.

This patch also includes a fix for the crash reported by Rajkumar.
While forcibly applying scan/join target to all the Paths for the scan/join
rel, earlier I was using apply_projection_to_path() which modifies the path
in-place which causing this crash as the path finally chosen has been
updated by partition-wise agg path creation. Now I have used
create_projection_path() like we do in partial aggregation paths.

Also, fixed issues reported by Ashutosh.

On Tue, Sep 26, 2017 at 6:16 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Hi Jeevan,
I have started reviewing these patches.

0001 looks fine. There might be some changes that will be needed, but
those will be clear when I review the patch that uses this
refactoring.
0002
+ *
+ * If targetlist is provided, we use it else use targetlist from the root.
*/
static double
get_number_of_groups(PlannerInfo *root,
double path_rows,
-                    grouping_sets_data *gd)
+                    grouping_sets_data *gd,
+                    List *tlist)
{
Query      *parse = root->parse;
double      dNumGroups;
+   List       *targetList = (tlist == NIL) ? parse->targetList : tlist;
May be we should just pass targetlist always. Instead of passing NIL,
pass parse->targetList directly. That would save us one conditional
assignment. May be passing NIL is required for the patches that use
this refactoring, but that's not clear as is in this patch.

Done in attached patch-set.

0003
In the documenation of enable_partition_wise_aggregate, we should
probably explain why the default is off or like partition_wise_join
GUC, explain the consequences of turning it off.

I have updated this. Please have a look.

I doubt if we could
accept something like partition_wise_agg_cost_factor looks. But we can
discuss this at a later stage. Mean time it may be worthwhile to fix
the reason why we would require this GUC. If the regular aggregation
has cost lesser than partition-wise aggregation in most of the cases,
then probably we need to fix the cost model.

Yep. I will have a look mean-while.

I will continue reviewing rest of the patches.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#20

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 8 years ago

In reply to: Jeevan Chalke (#19)

Re: Partition-wise aggregation/grouping

On Wed, Sep 27, 2017 at 3:42 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Thanks Ashutosh for reviewing.

Attached new patch-set with following changes:

1. Removed earlier 0007 and 0008 patches which were PoC for supporting
partial aggregation over fdw. I removed them as it will be a different
issue altogether and hence I will tackle them separately once this is
done.

This patch-set now includes support for parallel plans within partitions.

Notes:
HEAD: 59597e6
Partition-wise Join Version: 34

(First six patches 0001 - 0006, remains the same functionality-wise)
0007 - Refactors partial grouping paths creation into the separate function.
0008 - Enables parallelism within the partition-wise aggregation.

This patch also includes a fix for the crash reported by Rajkumar.
While forcibly applying scan/join target to all the Paths for the scan/join
rel, earlier I was using apply_projection_to_path() which modifies the path
in-place which causing this crash as the path finally chosen has been
updated by partition-wise agg path creation. Now I have used
create_projection_path() like we do in partial aggregation paths.

Also, fixed issues reported by Ashutosh.

Thanks.

Here are comments on 0004 from last patch set. But most the comments
still apply.

Patch 0001 adds functions create_hash_agg_path() and create_sort_agg_path().
Patch 0004 adds a new argument to those functions for conditions in HAVING
clause. We should move those changes to 0001 and pass parse->havingQual to
these functions in 0001 itself. That will keep all changes to those functions
together and also make 0003 small.

The prologue of try_partition_wise_grouping() mentions a restriction of
partition keys being leading group by clauses. This restriction is not
specified in the prologue of have_grouping_by_partkey(), which actually checks
for this restriction. The requirement per prologue of that function is just to
have partition keys in group clause. I think have_grouping_by_partkey() is
correct, and we should change prologue of try_partition_wise_grouping() to be
in sync with have_grouping_by_partkey(). The prologue explains why
partition-wise aggregation/grouping would be efficient with this restriction,
but it doesn't explain why partial aggregation/grouping per partition would be
efficient. May be it should do that as well. Similar is the case with partial
aggregation/grouping discussion in README.

+ /* Do not handle grouping sets for now. */
Is this a real restriction or just restriction for first cut of this feature?
Can you please add more explanation? May be update README as well?

+ grouped_rel->part_scheme = input_rel->part_scheme;
Is this true even for partial aggregates? I think not. Since group by clause
does not contain partition keys, the rows from multiple partitions participate
in one group and thus the partition keys of input relation do not apply to the
grouped relation. In this case, it seems that the grouped rel will have
part_rels but will not be partitioned.

+        /*
+         * If there is no path for the child relation, then we cannot add
+         * aggregate path too.
+         */
+        if (input_child_rel->pathlist == NIL)
+            return;
When can this happen? I think, similar to partition-wise join it happens when
there is a dummy parent relation. See [1]. If that's the case, you may want to
do things similar to what partition-wise join is doing. If there's some other
reason for doing this, returing from here half-way is actually waste of memory
and planning time. Instead, we may want to loop over the part_rels to find if
any of those have empty pathlist and return from there before doing any actual
work.

+        extra.pathTarget = child_target;
+        extra.inputRows = input_child_rel->cheapest_startup_path->rows;
+        extra.havingQual = (Node *) adjust_appendrel_attrs(root,
+                                                           (Node *)
query->havingQual,
+                                                           nappinfos,
+                                                           appinfos);
These lines are updating some fields of "extra" structure in every loop. The
structure is passed to create_child_grouping_paths() in the loop and to
add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel() only
gets some member values for the last child. Is that right? Should we split
extra into two structures one to be used within the loop and one outside? Or
may be send the members being updated within the loop separately?

+        /*
+         * Forcibly apply scan/join target to all the Paths for the scan/join
+         * rel.
+         *
[ lines clipped ]
+                if (subpath == input_child_rel->cheapest_total_path)
+                    input_child_rel->cheapest_total_path = path;
+            }
+        }
This code seems to be copied from grouping_planner() almost verbatim. Is there
a way we can refactor it into a function and use it in both the places.

have_grouping_by_partkey() may use match_expr_to_partition_keys() to find
whether a given clause expression matches any of the partition keys. Or you
could use list_intersection() instead of following loop
+        foreach(lc, partexprs)
+        {
+            Expr       *partexpr = lfirst(lc);
+
+            if (list_member(groupexprs, partexpr))
+            {
+                found = true;
+                break;
+            }
+        }
+        /*
+         * If none of the partition key matches with any of the GROUP BY
+         * expression, return false.
+         */
+        if (!found)
+            return false;

create_child_grouping_paths() and create_grouping_paths() has almost similar
code. Is there a way we could refactor the code to extract common code into a
function called by these two functions or reuse create_grouping_paths() for
children as well? I don't think we will be able to do the later.

+    /* Nothing to do if there is an empty pathlist */
+    if (grouped_rel->pathlist == NIL)
+        return false;
When would that happen? Similar condition in case of parent grouped rel throws
an error, so when this code is called, we know for sure that parent had
non-empty pathlist. So, we would expect child to have non-empty pathlist as
well.

+        grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
+                                      input_rel->relids);
+
+        /* Mark this rel as "other upper rel" */
+        grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
I think we need to pass relopkind as an argument to fetch_upper_rel(), now that
we have "upper" relations and "other upper" relations. relids will still be a
"key" to find an upper relation but its reloptkind should match the given
reloptkind.  fetch_upper_rel() is used to create the upper relation if it
doesn't exist. So, with the above code, if some other function calls
fetch_upper_rel() with given relids, it would get an upper rel with
RELOPT_UPPER_REL and then this code would change it to RELOPT_OTHER_UPPER_REL.
That looks odd. May be we should have written build_upper_rel() and
find_upper_rel() similar to build_join_rel() and find_join_rel() instead of
combining both the functionalities in one function.

+        /*
+         * create_append_path() sets the path target from the given relation.
+         * However, in this case grouped_rel doesn't have a target set.  So we
+         * must set the pathtarget to the passed in target.
+         */
+        apath->pathtarget = target;
I think, we should change create_append_path() functions to accept target as an
additional argument. For append rels other than aggregate and grouping, target
will be same as relation's target. For agg/group append rels, we will pass
different targets for partial and non-partial grouping paths.

+        /*
+         * Since Append's output is always unsorted, we'll need to sort,
+         * unless there's no GROUP BY clause or a degenerate (constant) one,
+         * in which case there will only be a single group.
+         */
append path here can be output of either merge append or append. If it's output
of merge append, we don't need to sort it again, do we?

create_partition_agg_paths() creates append paths and then adds finalization
path if necessary. The code to add finalization path seems to be similar to the
code that adds finalization path for parallel query. May be we could take out
common code into a function and call that function in two places. I see this
function as accepting a partial aggregation/grouping path and returning a path
that finalizes partial aggregates/groups.

[1]: /messages/by-id/CAFjFpRd5+zroxY7UMGTR2M=rjBV4aBOCxQg3+1rBmTPLK5mpDg@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Ashutosh Bapat (#20)

1 attachment(s)

Re: Partition-wise aggregation/grouping

On Thu, Sep 28, 2017 at 3:12 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Here are comments on 0004 from last patch set. But most the comments
still apply.

Thank you, Ashutosh for reviewing.

Patch 0001 adds functions create_hash_agg_path() and
create_sort_agg_path().
Patch 0004 adds a new argument to those functions for conditions in HAVING
clause. We should move those changes to 0001 and pass parse->havingQual to
these functions in 0001 itself. That will keep all changes to those
functions
together and also make 0003 small.

Done.

The prologue of try_partition_wise_grouping() mentions a restriction of
partition keys being leading group by clauses. This restriction is not
specified in the prologue of have_grouping_by_partkey(), which actually
checks
for this restriction. The requirement per prologue of that function is
just to
have partition keys in group clause. I think have_grouping_by_partkey() is
correct, and we should change prologue of try_partition_wise_grouping() to
be
in sync with have_grouping_by_partkey().

Done.

The prologue explains why

partition-wise aggregation/grouping would be efficient with this
restriction,
but it doesn't explain why partial aggregation/grouping per partition
would be
efficient. May be it should do that as well. Similar is the case with
partial
aggregation/grouping discussion in README.

I have tried updating it. Please check.

+ /* Do not handle grouping sets for now. */
Is this a real restriction or just restriction for first cut of this
feature?
Can you please add more explanation? May be update README as well?

Grouping sets plan does not work with an inheritance subtree (see notes in
create_groupingsets_plan). Thus grouping sets are not handled here.

+ grouped_rel->part_scheme = input_rel->part_scheme;
Is this true even for partial aggregates? I think not. Since group by
clause
does not contain partition keys, the rows from multiple partitions
participate
in one group and thus the partition keys of input relation do not apply to
the
grouped relation. In this case, it seems that the grouped rel will have
part_rels but will not be partitioned.

I have removed this as your analysis is correct. grouped_rel is not
partitioned.

+        /*
+         * If there is no path for the child relation, then we cannot add
+         * aggregate path too.
+         */
+        if (input_child_rel->pathlist == NIL)
+            return;
When can this happen? I think, similar to partition-wise join it happens
when
there is a dummy parent relation. See [1]. If that's the case, you may
want to
do things similar to what partition-wise join is doing. If there's some
other
reason for doing this, returing from here half-way is actually waste of
memory
and planning time. Instead, we may want to loop over the part_rels to find
if
any of those have empty pathlist and return from there before doing any
actual
work.

This is kind of can't happen scenario, so I have converted it to an
Assert().
And yes, I am marking a grouped_rel as dummy rel when input rel is dummy.

+        extra.pathTarget = child_target;
+        extra.inputRows = input_child_rel->cheapest_startup_path->rows;
+        extra.havingQual = (Node *) adjust_appendrel_attrs(root,
+                                                           (Node *)
query->havingQual,
+                                                           nappinfos,
+                                                           appinfos);
These lines are updating some fields of "extra" structure in every loop.
The
structure is passed to create_child_grouping_paths() in the loop and to
add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel()
only
gets some member values for the last child. Is that right?

No. Patch do update those fields before calling add_paths_to_append_rel().

Should we split

extra into two structures one to be used within the loop and one outside?
Or
may be send the members being updated within the loop separately?

I don't see any point in splitting. We need almost all fields at child path
creation as well as at finalization step. The patch basically just re-using
the struct variable.

+        /*
+         * Forcibly apply scan/join target to all the Paths for the
scan/join
+         * rel.
+         *
[ lines clipped ]
+                if (subpath == input_child_rel->cheapest_total_path)
+                    input_child_rel->cheapest_total_path = path;
+            }
+        }
This code seems to be copied from grouping_planner() almost verbatim. Is
there
a way we can refactor it into a function and use it in both the places.

Done.
Moved this in
0003-Refactor-code-applying-scanjoin-target-to-paths-into.patch

have_grouping_by_partkey() may use match_expr_to_partition_keys() to find
whether a given clause expression matches any of the partition keys. Or you
could use list_intersection() instead of following loop
+        foreach(lc, partexprs)
+        {
+            Expr       *partexpr = lfirst(lc);
+
+            if (list_member(groupexprs, partexpr))
+            {
+                found = true;
+                break;
+            }
+        }
+        /*
+         * If none of the partition key matches with any of the GROUP BY
+         * expression, return false.
+         */
+        if (!found)
+            return false;

Well, the logic in match_expr_to_partition_keys() does not exactly match
with
the scenarios here. It may match with few alterations but then it will
become
complex. So better to have them separate.

list_intersection() is a good suggestion as it will reduce this block
altogether and will have less lines-of-code to maintain. However, it returns
a list of all matching cells from List1 which is done by comparing all
elements. But here in this case we don't need to match further after very
first match. Thus this logic saves on those unnecessary matching.

create_child_grouping_paths() and create_grouping_paths() has almost
similar
code. Is there a way we could refactor the code to extract common code
into a
function called by these two functions or reuse create_grouping_paths() for
children as well? I don't think we will be able to do the later.

After refactoring most of the code in create_grouping_paths() (0001-0003),
it is very little code remained which is duplicated. Refactoring those few
lines into another function looks odd.
Let me know, if you still think to refactor those few lines in a separate
function.

+    /* Nothing to do if there is an empty pathlist */
+    if (grouped_rel->pathlist == NIL)
+        return false;
When would that happen? Similar condition in case of parent grouped rel
throws
an error, so when this code is called, we know for sure that parent had
non-empty pathlist. So, we would expect child to have non-empty pathlist as
well.

Yes and agree too. This is kind of not-reachable return.
Do you mean we should also throw an error here like in case of parent
grouped
rel? I opted to not throw an error and instead go with the non
partition-wise
path.

+        grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
+                                      input_rel->relids);
+
+        /* Mark this rel as "other upper rel" */
+        grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
I think we need to pass relopkind as an argument to fetch_upper_rel(), now
that
we have "upper" relations and "other upper" relations. relids will still
be a
"key" to find an upper relation but its reloptkind should match the given
reloptkind.  fetch_upper_rel() is used to create the upper relation if it
doesn't exist. So, with the above code, if some other function calls
fetch_upper_rel() with given relids, it would get an upper rel with
RELOPT_UPPER_REL and then this code would change it to
RELOPT_OTHER_UPPER_REL.
That looks odd. May be we should have written build_upper_rel() and
find_upper_rel() similar to build_join_rel() and find_join_rel() instead of
combining both the functionalities in one function.

Make sense. But I am reluctant to update fetch_upper_rel() and all it's
callers.
However, do you think having a separate function for other upper rel for
this
is a good idea, named fetch_other_upper_rel() in-lined with
fetch_upper_rel()?

+        /*
+         * create_append_path() sets the path target from the given
relation.
+         * However, in this case grouped_rel doesn't have a target set.
So we
+         * must set the pathtarget to the passed in target.
+         */
+        apath->pathtarget = target;
I think, we should change create_append_path() functions to accept target
as an
additional argument. For append rels other than aggregate and grouping,
target
will be same as relation's target. For agg/group append rels, we will pass
different targets for partial and non-partial grouping paths.

Done in 0005-Pass-pathtarget-to-create_-merge_-append_path.patch.

+        /*
+         * Since Append's output is always unsorted, we'll need to sort,
+         * unless there's no GROUP BY clause or a degenerate (constant)
one,
+         * in which case there will only be a single group.
+         */
append path here can be output of either merge append or append. If it's
output
of merge append, we don't need to sort it again, do we?

Yes, you are right, we don't need an explicit sort over merge-append.
Done those changes.

create_partition_agg_paths() creates append paths and then adds
finalization
path if necessary. The code to add finalization path seems to be similar
to the
code that adds finalization path for parallel query. May be we could take
out
common code into a function and call that function in two places. I see
this
function as accepting a partial aggregation/grouping path and returning a
path
that finalizes partial aggregates/groups.

It seems that it will become messy. Per my understanding the only common
code
is related to the add_path() call with appropriate create_agg_path() or
create_group_path(). Those are anyways function calls and I don't see any
reason to split them into the separate function.

Attached new patch set having HEAD at 84ad4b0 with all these review points
fixed. Let me know if I missed any thanks.

I have merged parallelism changes into main patch i.e. 0007 as most of the
changes in that patch are actual modifying same lines added by 0007.

Thanks

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#22

David Rowley

david.rowley@2ndquadrant.com

over 8 years ago

In reply to: Jeevan Chalke (#21)

Re: Partition-wise aggregation/grouping

On 10 October 2017 at 01:10, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new patch set having HEAD at 84ad4b0 with all these review points
fixed. Let me know if I missed any thanks.

I've only really skimmed over this thread and only opened the code
enough to extract the following:

+ /* Multiply the costs by partition_wise_agg_cost_factor. */
+ apath->startup_cost *= partition_wise_agg_cost_factor;
+ apath->total_cost *= partition_wise_agg_cost_factor;

I've not studied how all the path plumbing is done, but I think
instead of doing this costing magic we should really stop pretending
that Append/MergeAppend nodes are cost-free. I think something like
charging cpu_tuple_cost per row expected through Append/MergeAppend
would be a better approach to this.

If you perform grouping or partial grouping before the Append, then in
most cases the Append will receive less rows, so come out cheaper than
if you perform the grouping after it. I've not learned the
partition-wise join code enough to know if this is going to affect
that too, but for everything else, there should be no plan change,
since there's normally no alternative paths. I see there's even a
comment in create_append_path() which claims the zero cost is a bit
optimistic.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 8 years ago

In reply to: David Rowley (#22)

Re: Partition-wise aggregation/grouping

On Tue, Oct 10, 2017 at 3:15 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 10 October 2017 at 01:10, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new patch set having HEAD at 84ad4b0 with all these review points
fixed. Let me know if I missed any thanks.

I've only really skimmed over this thread and only opened the code
enough to extract the following:
+ /* Multiply the costs by partition_wise_agg_cost_factor. */
+ apath->startup_cost *= partition_wise_agg_cost_factor;
+ apath->total_cost *= partition_wise_agg_cost_factor;
I've not studied how all the path plumbing is done, but I think
instead of doing this costing magic we should really stop pretending
that Append/MergeAppend nodes are cost-free. I think something like
charging cpu_tuple_cost per row expected through Append/MergeAppend
would be a better approach to this.

If you perform grouping or partial grouping before the Append, then in
most cases the Append will receive less rows, so come out cheaper than
if you perform the grouping after it. I've not learned the
partition-wise join code enough to know if this is going to affect
that too, but for everything else, there should be no plan change,
since there's normally no alternative paths. I see there's even a
comment in create_append_path() which claims the zero cost is a bit
optimistic.

+1. Partition-wise join will also benefit from costing Append
processing. Number of rows * width of join result compared with the
sum of that measure for joining relations decides whether Append node
processes more data in Append->Join case than Join->Append case.

Append node just returns the result of ExecProcNode(). Charging
cpu_tuple_cost may make it too expensive. In other places where we
charge cpu_tuple_cost there's some processing done to the tuple like
ExecStoreTuple() in SeqNext(). May be we need some other measure for
Append's processing of the tuple.

May be we should try to measure the actual time spent in Append node
as a fraction of say time spent in child seq scans. That might give us
a clue as to how Append processing can be charged in terms of costing.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: Ashutosh Bapat (#23)

Re: Partition-wise aggregation/grouping

On Tue, Oct 10, 2017 at 10:27 AM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Tue, Oct 10, 2017 at 3:15 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 10 October 2017 at 01:10, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new patch set having HEAD at 84ad4b0 with all these review

points
fixed. Let me know if I missed any thanks.

I've only really skimmed over this thread and only opened the code
enough to extract the following:
+ /* Multiply the costs by partition_wise_agg_cost_factor. */
+ apath->startup_cost *= partition_wise_agg_cost_factor;
+ apath->total_cost *= partition_wise_agg_cost_factor;
I've not studied how all the path plumbing is done, but I think
instead of doing this costing magic we should really stop pretending
that Append/MergeAppend nodes are cost-free. I think something like
charging cpu_tuple_cost per row expected through Append/MergeAppend
would be a better approach to this.

If you perform grouping or partial grouping before the Append, then in
most cases the Append will receive less rows, so come out cheaper than
if you perform the grouping after it. I've not learned the
partition-wise join code enough to know if this is going to affect
that too, but for everything else, there should be no plan change,
since there's normally no alternative paths. I see there's even a
comment in create_append_path() which claims the zero cost is a bit
optimistic.
+1.

Yes. Me and Ashutosh had a thought on this offlist that we will need to
cost Append node too as having an extra GUC to control this is not a good
idea per se. Thanks for your vote too.

I will try doing this and will see how plan goes with it.

Partition-wise join will also benefit from costing Append

processing. Number of rows * width of join result compared with the
sum of that measure for joining relations decides whether Append node
processes more data in Append->Join case than Join->Append case.

Append node just returns the result of ExecProcNode(). Charging
cpu_tuple_cost may make it too expensive. In other places where we
charge cpu_tuple_cost there's some processing done to the tuple like
ExecStoreTuple() in SeqNext(). May be we need some other measure for
Append's processing of the tuple.

May be we should try to measure the actual time spent in Append node
as a fraction of say time spent in child seq scans. That might give us
a clue as to how Append processing can be charged in terms of costing.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

#25

David Rowley

david.rowley@2ndquadrant.com

over 8 years ago

In reply to: Ashutosh Bapat (#23)

Re: Partition-wise aggregation/grouping

On 10 October 2017 at 17:57, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Append node just returns the result of ExecProcNode(). Charging
cpu_tuple_cost may make it too expensive. In other places where we
charge cpu_tuple_cost there's some processing done to the tuple like
ExecStoreTuple() in SeqNext(). May be we need some other measure for
Append's processing of the tuple.

I don't think there's any need to invent any new GUC. You could just
divide cpu_tuple_cost by something.

I did a quick benchmark on my laptop to see how much Append really
costs, and with the standard costs the actual cost seems to be about
cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
realistic. create_set_projection_path() does something similar and
brincostestimate() does some similar magic and applies 0.1 *
cpu_operator_cost to the total cost.

# create table p (a int, b int);
# create table p1 () inherits (p);
# insert into p1 select generate_series(1,1000000);
# vacuum analyze p1;
# \q
$ echo "select count(*) from p1;" > p1.sql
$ echo "select count(*) from p;" > p.sql
$ pgbench -T 60 -f p1.sql -n

latency average = 58.567 ms

$ pgbench -T 60 -f p.sql -n
latency average = 72.984 ms

$ psql
psql (11devel)
Type "help" for help.

# -- check the cost of the plan.
# explain select count(*) from p1;
QUERY PLAN
------------------------------------------------------------------
Aggregate (cost=16925.00..16925.01 rows=1 width=8)
-> Seq Scan on p1 (cost=0.00..14425.00 rows=1000000 width=0)
(2 rows)

# -- selecting from the parent is the same due to zero Append cost.
# explain select count(*) from p;
QUERY PLAN
------------------------------------------------------------------------
Aggregate (cost=16925.00..16925.01 rows=1 width=8)
-> Append (cost=0.00..14425.00 rows=1000001 width=0)
-> Seq Scan on p (cost=0.00..0.00 rows=1 width=0)
-> Seq Scan on p1 (cost=0.00..14425.00 rows=1000000 width=0)
(4 rows)

# -- extrapolate the additional time taken for the Append scan and
work out what the planner
# -- should add to the plan's cost, then divide by the number of rows
in p1 to work out the
# -- tuple cost of pulling a row through the append.
# select (16925.01 * (72.984 / 58.567) - 16925.01) / 1000000;
?column?
------------------------
0.00416630302337493743
(1 row)

# show cpu_tuple_cost;
cpu_tuple_cost
----------------
0.01
(1 row)

# -- How does that compare to the cpu_tuple_cost?
# select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743;
?column?
----------------
2.400209476818
(1 row)

Maybe it's worth trying with different row counts to see if the
additional cost is consistent, but it's probably not worth being too
critical here.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

over 8 years ago

In reply to: David Rowley (#25)

Re: Partition-wise aggregation/grouping

On Tue, Oct 10, 2017 at 1:31 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

I don't think there's any need to invent any new GUC. You could just
divide cpu_tuple_cost by something.

I did a quick benchmark on my laptop to see how much Append really
costs, and with the standard costs the actual cost seems to be about
cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
realistic. create_set_projection_path() does something similar and
brincostestimate() does some similar magic and applies 0.1 *
cpu_operator_cost to the total cost.

# -- How does that compare to the cpu_tuple_cost?
# select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743;
?column?
----------------
2.400209476818
(1 row)

Maybe it's worth trying with different row counts to see if the
additional cost is consistent, but it's probably not worth being too
critical here.

This looks good to me. I think it should be a separate, yet very small patch.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Ashutosh Bapat (#26)

Re: Partition-wise aggregation/grouping

On Tue, Oct 10, 2017 at 6:00 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

This looks good to me. I think it should be a separate, yet very small patch.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Jeevan Chalke

jeevan.chalke@enterprisedb.com

over 8 years ago

In reply to: David Rowley (#25)

1 attachment(s)

Re: Partition-wise aggregation/grouping

On Tue, Oct 10, 2017 at 1:31 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

On 10 October 2017 at 17:57, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Append node just returns the result of ExecProcNode(). Charging
cpu_tuple_cost may make it too expensive. In other places where we
charge cpu_tuple_cost there's some processing done to the tuple like
ExecStoreTuple() in SeqNext(). May be we need some other measure for
Append's processing of the tuple.

I don't think there's any need to invent any new GUC. You could just
divide cpu_tuple_cost by something.

I did a quick benchmark on my laptop to see how much Append really
costs, and with the standard costs the actual cost seems to be about
cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be
realistic. create_set_projection_path() does something similar and
brincostestimate() does some similar magic and applies 0.1 *
cpu_operator_cost to the total cost.

# create table p (a int, b int);
# create table p1 () inherits (p);
# insert into p1 select generate_series(1,1000000);
# vacuum analyze p1;
# \q
$ echo "select count(*) from p1;" > p1.sql
$ echo "select count(*) from p;" > p.sql
$ pgbench -T 60 -f p1.sql -n

latency average = 58.567 ms

$ pgbench -T 60 -f p.sql -n
latency average = 72.984 ms

$ psql
psql (11devel)
Type "help" for help.

# -- check the cost of the plan.
# explain select count(*) from p1;
QUERY PLAN
------------------------------------------------------------------
Aggregate (cost=16925.00..16925.01 rows=1 width=8)
-> Seq Scan on p1 (cost=0.00..14425.00 rows=1000000 width=0)
(2 rows)

# -- selecting from the parent is the same due to zero Append cost.
# explain select count(*) from p;
QUERY PLAN
------------------------------------------------------------------------
Aggregate (cost=16925.00..16925.01 rows=1 width=8)
-> Append (cost=0.00..14425.00 rows=1000001 width=0)
-> Seq Scan on p (cost=0.00..0.00 rows=1 width=0)
-> Seq Scan on p1 (cost=0.00..14425.00 rows=1000000 width=0)
(4 rows)

# -- extrapolate the additional time taken for the Append scan and
work out what the planner
# -- should add to the plan's cost, then divide by the number of rows
in p1 to work out the
# -- tuple cost of pulling a row through the append.
# select (16925.01 * (72.984 / 58.567) - 16925.01) / 1000000;
?column?
------------------------
0.00416630302337493743
(1 row)

# show cpu_tuple_cost;
cpu_tuple_cost
----------------
0.01
(1 row)

# -- How does that compare to the cpu_tuple_cost?
# select current_Setting('cpu_tuple_cost')::float8 /
0.00416630302337493743;
?column?
----------------
2.400209476818
(1 row)

Maybe it's worth trying with different row counts to see if the
additional cost is consistent, but it's probably not worth being too
critical here.

I have tried exactly same tests to get to this factor on my local developer
machine. And with parallelism enabled I got this number as 7.9. However, if
I disable the parallelism (and I believe David too disabled that), I get
this number as 1.8. Whereas for 10000 rows, I get this number to 1.7

-- With Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((10633.56 * (81.035 /
72.450) - 10633.56) / 1000000);
7.9

-- Without Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((16925.01 * (172.838
/ 131.400) - 16925.01) / 1000000);
1.8

-- With 10000 rows (so no Gather too)
# select current_Setting('cpu_tuple_cost')::float8 / ((170.01 * (1.919 /
1.424) - 170.01) / 10000);
1.7

So it is not so straight forward to come up the correct heuristic here.
Thus using 50% of cpu_tuple_cost look good to me here.

As suggested by Ashutosh and Robert, attached separate small WIP patch for
it.

I think it will be better if we take this topic on another mail-thread.
Do you agree?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

pg_cost_append_v1.patchtext/x-patch; charset=US-ASCII; name=pg_cost_append_v1.patchDownload

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ce32b8a4..c0bf602 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1742,6 +1742,33 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of an Append node.
+ *
+ * Though Append doesn't do any selection or projection, it's not free.  So we
+ * try to add cost per input tuple which is arbitrarily calculated as
+ * DEFAULT_APPEND_COST_FACTOR * cpu_tuple_cost.
+ *
+ * 'input_startup_cost' is the sum of the input streams' startup costs
+ * 'input_total_cost' is the sum of the input streams' total costs
+ * 'tuples' is the number of tuples in all the streams
+ */
+void
+cost_append(Path *path,
+			Cost input_startup_cost, Cost input_total_cost,
+			double tuples)
+{
+	Cost		startup_cost = 0;
+	Cost		run_cost = 0;
+
+	/* Add Append node overhead. */
+	run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
+	path->startup_cost = startup_cost + input_startup_cost;
+	path->total_cost = startup_cost + run_cost + input_total_cost;
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
@@ -1800,6 +1827,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
 	 */
 	run_cost += cpu_operator_cost * tuples;
 
+	/* Add MergeAppend node overhead like we do it for the Append node */
+	run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
 	path->startup_cost = startup_cost + input_startup_cost;
 	path->total_cost = startup_cost + run_cost + input_total_cost;
 }
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2d491eb..c8fdf1c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1212,6 +1212,8 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 				   int parallel_workers, List *partitioned_rels)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
+	Cost		input_startup_cost;
+	Cost		input_total_cost;
 	ListCell   *l;
 
 	pathnode->path.pathtype = T_Append;
@@ -1227,32 +1229,31 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 	pathnode->subpaths = subpaths;
 
 	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
+	 * Add up the sizes and costs of the input paths.
 	 */
 	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
+	input_startup_cost = 0;
+	input_total_cost = 0;
 	foreach(l, subpaths)
 	{
 		Path	   *subpath = (Path *) lfirst(l);
 
 		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
 		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
 			subpath->parallel_safe;
 
+		if (l == list_head(subpaths))	/* first node? */
+			input_startup_cost = subpath->startup_cost;
+		input_total_cost += subpath->total_cost;
+
 		/* All child paths must have same parameterization */
 		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
 	}
 
+	/* Now we can compute total costs of the Append */
+	cost_append(&pathnode->path, input_startup_cost, input_total_cost,
+				pathnode->path.rows);
+
 	return pathnode;
 }
 
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index 306d923..cd42e5a 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -31,6 +31,12 @@
 
 #define DEFAULT_EFFECTIVE_CACHE_SIZE  524288	/* measured in pages */
 
+/*
+ * Arbitrarily use 50% of the cpu_tuple_cost to cost append node. Note that
+ * this value should be multiplied with cpu_tuple_cost wherever applicable.
+ */
+#define DEFAULT_APPEND_COST_FACTOR 0.5
+
 typedef enum
 {
 	CONSTRAINT_EXCLUSION_OFF,	/* do not use c_e */
@@ -106,6 +112,9 @@ extern void cost_sort(Path *path, PlannerInfo *root,
 		  List *pathkeys, Cost input_cost, double tuples, int width,
 		  Cost comparison_cost, int sort_mem,
 		  double limit_tuples);
+extern void cost_append(Path *path,
+			Cost input_startup_cost, Cost input_total_cost,
+			double tuples);
 extern void cost_merge_append(Path *path, PlannerInfo *root,
 				  List *pathkeys, int n_streams,
 				  Cost input_startup_cost, Cost input_total_cost,

#29

David Rowley

david.rowley@2ndquadrant.com

over 8 years ago

In reply to: Jeevan Chalke (#28)

Re: Partition-wise aggregation/grouping

On 13 October 2017 at 19:36, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I have tried exactly same tests to get to this factor on my local developer
machine. And with parallelism enabled I got this number as 7.9. However, if
I disable the parallelism (and I believe David too disabled that), I get
this number as 1.8. Whereas for 10000 rows, I get this number to 1.7

-- With Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((10633.56 * (81.035 /
72.450) - 10633.56) / 1000000);
7.9

-- Without Gather
# select current_Setting('cpu_tuple_cost')::float8 / ((16925.01 * (172.838 /
131.400) - 16925.01) / 1000000);
1.8

-- With 10000 rows (so no Gather too)
# select current_Setting('cpu_tuple_cost')::float8 / ((170.01 * (1.919 /
1.424) - 170.01) / 10000);
1.7

So it is not so straight forward to come up the correct heuristic here. Thus
using 50% of cpu_tuple_cost look good to me here.

As suggested by Ashutosh and Robert, attached separate small WIP patch for
it.

Good to see it stays fairly consistent at different tuple counts, and
is not too far away from what I got on this machine.

I looked over the patch and saw this:

@@ -1800,6 +1827,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
*/
run_cost += cpu_operator_cost * tuples;

+ /* Add MergeAppend node overhead like we do it for the Append node */
+ run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
  path->startup_cost = startup_cost + input_startup_cost;
  path->total_cost = startup_cost + run_cost + input_total_cost;
 }

You're doing that right after a comment that says we don't do that. It
also does look like the "run_cost += cpu_operator_cost * tuples" is
trying to do the same thing, so perhaps it's worth just replacing
that, which by default will double that additional cost, although
doing so would have the planner slightly prefer a MergeAppend to an
Append than previously.

+#define DEFAULT_APPEND_COST_FACTOR 0.5

I don't really think the DEFAULT_APPEND_COST_FACTOR adds much. it
means very little by itself. It also seems that most of the other cost
functions just use the magic number.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: David Rowley (#29)

Re: Partition-wise aggregation/grouping

On Fri, Oct 13, 2017 at 1:13 PM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

I looked over the patch and saw this:

@@ -1800,6 +1827,9 @@ cost_merge_append(Path *path, PlannerInfo *root,
*/
run_cost += cpu_operator_cost * tuples;
+ /* Add MergeAppend node overhead like we do it for the Append node */
+ run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
path->startup_cost = startup_cost + input_startup_cost;
path->total_cost = startup_cost + run_cost + input_total_cost;
}
You're doing that right after a comment that says we don't do that. It
also does look like the "run_cost += cpu_operator_cost * tuples" is
trying to do the same thing, so perhaps it's worth just replacing
that, which by default will double that additional cost, although
doing so would have the planner slightly prefer a MergeAppend to an
Append than previously.

I think we can remove that code block entirely. I have added relevant
comments
around DEFAULT_APPEND_COST_FACTOR already.
However, I am not sure of doing this as you correctly said it may prefer
MergeAppend to an Append. Will it be fine we remove that code block?

+#define DEFAULT_APPEND_COST_FACTOR 0.5

I don't really think the DEFAULT_APPEND_COST_FACTOR adds much. it
means very little by itself. It also seems that most of the other cost
functions just use the magic number.

Agree, but those magic numbers used only once at that place. But here there
are two places. So if someone wants to update it, (s)he needs to make sure
to update that at two places. To minimize that risk, having a #define seems
better.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#31

Dilip Kumar

dilipbalaut@gmail.com

about 8 years ago

In reply to: Jeevan Chalke (#28)

Re: Partition-wise aggregation/grouping

On Fri, Oct 13, 2017 at 12:06 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

While playing around with the patch I have noticed one regression with
the partial partition-wise aggregate.

I am consistently able to reproduce this on my local machine.

Scenario: Group by on non-key column and only one tuple per group.

Complete Test:
--------------------
create table t(a int,b int) partition by range(a);
create table t1 partition of t for values from (1) to (100000);
create table t2 partition of t for values from (100000) to (200000);

insert into t values (generate_series(1,199999),generate_series(1, 199999));
postgres=# explain analyze select sum(a) from t group by b;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=20379.55..28379.51 rows=199999
width=12) (actual time=102.311..322.969 rows=199999 loops=1)
Group Key: t1.b
-> Merge Append (cost=20379.55..25379.53 rows=199999 width=12)
(actual time=102.303..232.310 rows=199999 loops=1)
Sort Key: t1.b
-> Partial GroupAggregate (cost=10189.72..11939.70
rows=99999 width=12) (actual time=52.164..108.967 rows=99999 loops=1)
Group Key: t1.b
-> Sort (cost=10189.72..10439.72 rows=99999 width=8)
(actual time=52.158..66.236 rows=99999 loops=1)
Sort Key: t1.b
Sort Method: external merge Disk: 1768kB
-> Seq Scan on t1 (cost=0.00..1884.99
rows=99999 width=8) (actual time=0.860..20.388 rows=99999 loops=1)
-> Partial GroupAggregate (cost=10189.82..11939.82
rows=100000 width=12) (actual time=50.134..102.976 rows=100000
loops=1)
Group Key: t2.b
-> Sort (cost=10189.82..10439.82 rows=100000 width=8)
(actual time=50.128..63.362 rows=100000 loops=1)
Sort Key: t2.b
Sort Method: external merge Disk: 1768kB
-> Seq Scan on t2 (cost=0.00..1885.00
rows=100000 width=8) (actual time=0.498..20.977 rows=100000 loops=1)
Planning time: 0.190 ms
Execution time: 339.929 ms
(18 rows)

postgres=# set enable_partition_wise_agg=off;
SET
postgres=# explain analyze select sum(a) from t group by b;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=26116.53..29616.51 rows=199999 width=12)
(actual time=139.413..250.751 rows=199999 loops=1)
Group Key: t1.b
-> Sort (cost=26116.53..26616.52 rows=199999 width=8) (actual
time=139.406..168.775 rows=199999 loops=1)
Sort Key: t1.b
Sort Method: external merge Disk: 3544kB
-> Result (cost=0.00..5769.98 rows=199999 width=8) (actual
time=0.674..76.392 rows=199999 loops=1)
-> Append (cost=0.00..3769.99 rows=199999 width=8)
(actual time=0.672..40.291 rows=199999 loops=1)
-> Seq Scan on t1 (cost=0.00..1884.99
rows=99999 width=8) (actual time=0.672..12.408 rows=99999 loops=1)
-> Seq Scan on t2 (cost=0.00..1885.00
rows=100000 width=8) (actual time=1.407..11.689 rows=100000 loops=1)
Planning time: 0.146 ms
Execution time: 263.678 ms
(11 rows)

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Dilip Kumar (#31)

Re: Partition-wise aggregation/grouping

On Tue, Oct 17, 2017 at 7:13 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Oct 13, 2017 at 12:06 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

While playing around with the patch I have noticed one regression with
the partial partition-wise aggregate.

I am consistently able to reproduce this on my local machine.

Scenario: Group by on non-key column and only one tuple per group.

I didn't get what you mean by regression here. Can you please explain?

I see that PWA plan is selected over regular plan when enabled on the basis
of costing.
Regular planning need a Result node due to which costing increases where as
PWA don't need that and thus wins.

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#33

Dilip Kumar

dilipbalaut@gmail.com

about 8 years ago

In reply to: Jeevan Chalke (#32)

Re: Partition-wise aggregation/grouping

On Tue, Oct 17, 2017 at 10:44 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I didn't get what you mean by regression here. Can you please explain?

I see that PWA plan is selected over regular plan when enabled on the basis
of costing.
Regular planning need a Result node due to which costing increases where as
PWA don't need that and thus wins.

Sorry for not clearly explaining, I meant that with normal plan
execution time is 263.678 ms whereas with PWA its 339.929 ms.

I only set enable_partition_wise_agg=on and it switched to PWA and
execution time increased by 30%.
I understand that the this is the worst case for PWA where
FinalizeAggregate is getting all the tuple.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Dilip Kumar (#33)

1 attachment(s)

Re: Partition-wise aggregation/grouping

Hi,

Attached new patch-set here. Changes include:

1. Added separate patch for costing Append node as discussed up-front in the
patch-set.
2. Since we now cost Append node, we don't need
partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main
implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

Thanks

On Wed, Oct 18, 2017 at 9:53 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Oct 17, 2017 at 10:44 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I didn't get what you mean by regression here. Can you please explain?

I see that PWA plan is selected over regular plan when enabled on the

basis

of costing.
Regular planning need a Result node due to which costing increases where

as

PWA don't need that and thus wins.

Sorry for not clearly explaining, I meant that with normal plan
execution time is 263.678 ms whereas with PWA its 339.929 ms.

I only set enable_partition_wise_agg=on and it switched to PWA and
execution time increased by 30%.
I understand that the this is the worst case for PWA where
FinalizeAggregate is getting all the tuple.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

partition-wise-agg-v6.tar.gzapplication/x-gzip; name=partition-wise-agg-v6.tar.gzDownload

���Y�\�s�F��W�����DH��|h�q�����95�����&�h4`Y����^w��u$�S�LJ4�>��{G?`�&i�q�����u�����Q?=�L�c��O�=�[�����h���G�'��h4�<a��]F�'��0��7�?���v���7���v~�+�K����>����� Zw��F��A���y�I���M7�Di��?�Med�d2����x2j�������$`�������I�e�>�{��F���	��������?���r����p�~�#v�w�?e��3��
������������^H�9���G)Ov	H��t�x{n}����!	�l0e?{)�#��z�g�	�{�a�����q/}������Wo��;�o�e��l2-�L�&������l2�M�	���Uy(��eu:��x���}����wi�
����.t#�������?�2���)},�g� �v���f�9�@��'�i�Mf��!�:�����t�A���Q�Y��
"�f�����#�����l:c}�����|���9���u�Ix�_s@���c���g^��
7\�,�O'����*fgI�m�t��.�W�h�����m����6);������������'Z�q�@p79Ms�J7�I���O�A�"��b�~����b�
v�;������1��B��B��"�i���e��l��c�tz/E��2f��`�l�,�0��H"��{��by6���cf�?���D3`�B	G�Ni�A��I������� t��/;c?��,�IIv���U�9�|����n�n8�I���R���������/���S�;��kzg��:Ix�o����)��r�+����4�G�"���e)��o��0��Cl����>��U�[/yV�v���X[�pvM��B#���q�~}�A�i���)w�~[�\!���D��v������_\L��
�8r��o] ���N����E��	��p�W}��f'�P9�\j�����������L�x��^��4�T�T���/`.w��I��JfI�@���cS�&5��Z(�g"[�m����5vv��U�_C����%�x!�OTw�����%����oUo5�T�]����~"��X/�&��	����j�.�t�$��>���x!�F�����p�����=K���;��y�o��Z=`�-�/'��2�s�)QZ��Z����D$�GLS�+M��~���j�u��	#�"�"�;P�M�m������+��4����[��
Y6���]�����tS�h#�'�%����X�C��&P_��-T���q�Y�Rp�tx��*��W��-,5W��"m�j��v�����qx��lG�+�;M����
l���g)�����%���y_Qf�(����)����8�R����+���O��%N
L]� �B;n�%�.��j
:�<�����.T4����C��+]P�~8B�!���>��,���=����&M"�;��C�Mx>��M�&�"�B����#J2�D��\�lA�q
���6[��4�n�6��e�V{��� .a/9N{���jxn����e0-xi P��C�M��/���hf�!@���	Q�0�2!�~��oI'���h(]� U�/+�s$+�G�\A�M����t�j��hK���@K��p���&(���Q}$#c�����������a�Mn�x��L�s�d�W&}�)�����D��vP���	�On�iS��9��B��x��|����k�=���}��-0a��,�h��|��e�8����$�&�1��x���T��s3����E��e�69������{�TB��'�#�)�'
�)�'�	&���x�v9}����,.��qV�Z�m1�7<��'b��k�V#����	(~���-ET����l�-���Xm*h%��J[���G��S5e��CQ�^��@J�I�5C)�%eo��������t0r:~d3fj�*�����Z��FJ�#J���$�q�$��Q�}K��QY�74�)��_\d������q`������c#!=}=$���DAv+��;�@U�,�\����=�}
K4�)Y�{E<�c<�����Naq9D.�;��;3g��-�Q�����N;��[ |��(�k�.��_��lt���~2��������:������V��p5��g=���~�\F�)��no5w�?��w8~6��9����o.��,�����m��V������Y��=�Z�o ����~#P�-��}:�B���� �Ba#�z��=���h���*\��C���
9�����@:
��`�	���X�6XoR�3xD8���Z+�<���%� u�aT���O�$�a����
����B`�c���j	�[F��ao�k@���i�:!	�3�XI����-\S���{,*H��V3���]��r�L��	BQ�YRn������H'��=��D�1�
��WxaFQ}��w��X}"�@H�����F���u
k�4m�>}�1�]��N��� ��a�3l�4z�M�T;3�@l$��^��K���gK�*'��j\�I��+��bW2��@��ON��s-(m$�0������Bj����-������E�.%p$}�����Z�2���]q�q�/G���H��mX����`n�v�	&7&���6DQ/hd�/���U8�W�*���s+��9(�� z�n|�p��B�����W��S���rD��"����\�����(#!��O�	G���G� b�
�^�����f�������������?�n�60{o?�j8-�#��7����=��k������/�*]��?���m��c����&�~sUU���� �m ��:��q����lt���A/���A�9�b���k�]��Z+�9Fmg�]�~����{���+�F4��n��y�Q�z�����4]�h�~W��O�:��l��@���5�SEi7���+����������DQ6��`���J��J��{KU|��Io����F���-W|��,�����#�!x�X�E��1Y\����E�)%D�-���y��<��qy/�F�sv���R�|�E�)wQ_��&x����U�������/���t$�50O�g�Cw���O����s5:{{���$l�|oZ5a'/PF���������|@���/��o]��1��3��,���"���&��[��S�Y��X
�X�d�#������ ��1�@q�{�
���8���*��q�a�Dw@n�o����,M�L�����X�oE���r�T�~bm��e�
l�'-�J��,Hp	{&�%>d�1�z�h1��[W9
T�����������9�N�gP[�s�T�\+� z����A	n���[�Ln`�AgZz�� �a7<�W+���!H�)
�R��Jike=����dN�&�[�94��jv���Q58��V�$�j^�m���a��{C�A0=����U��UM������A1}�u1��^������N��e�4�<��G��V{��]Q�]S��0V�ZP��O�#���&!�#����\��0�+��ry�B��<.���m�~��������c�o����8#))l\] m,���}@�-�%oWhp����R����V
c P\d�f����-���_F�"�n
�-�C�3�$�Vy/�I�	"�����#.
X�eW�j�j0fJ��X�P���IaE[��p�VW�y���W�'��Cx����#��M��C���\�7����?</h`��(��XbL�`�#�Bi�P�
�s��4,�������q�<��M��Q%x�b`�p����Wl�\�l����7�]�����RLi������-��"%/#���-�K2�)4�.�@3R2�N	�E^�`��H##}�t�f����Z��\R��DB�;�W���D7������v��������o�$�\$8q�4��K"�����T��E���-��3�S��P��z8�B���6)7�����N��-9[����S�T��@�w|��R�AQM,�����"���T��l���<��#e
9R�����U�T�EZ`���I�5$�P������1�|/S"4^Vr<�=�'|,f�F|Lbm��^��B���JZv�!GR���|�Q�-UR4{�!*�S������l�8}>X����\��t���J[t�������wV}pi�!����'kndJ�+ac�l�p��+k
�O*%
�E�>�Ze��$�%�yC�0W)����&m��x�Z_N�|�3��4����k�*�R�U���HzW4�>��������"g �u�s<���p��Ty&�%��b����c��'� 3p����R��{Yav�^VM���k3�9�w��Z�x�/����%)o�K�a�o��1>�RS�m��.��_����(���*��������d�1���I����$�RK�{���U��,�����T��T��e'�.�P����4����K�"��~����������mL(�?����7�E��'Umx1b�|r<��k��������VU����uG4��}y0'>�����)M�zQ=���h��L	��<�����)� ��	&��#�>A����W(�uY:`j0��c:2N�p83�Z�0�6eC:������G��.{���4v�8M�x��c5�C)�e��Q������s��?��(c4�<o!G
��JX��<�2�bZ�h4�����=(O&�(�AwK�a���CmX��[c�/�
����8��0��?6'����*�'��z�9h�h6i��D<d��|�#��Q0tA����b|X8q1��?{a�%�)
����j���P~J�T-�2�i�XQy.��2<���B�6����|��/��U�yx@�����o+OW��b!�[5�i����7q
���Sm�Lm��L�����
4�"�a�-gQ�D�T�I2;�������.���1����X~x�z��.WN����!^�����j��]���P!�)�������4�����s�(V*e�^�L[��^�����2.�d�G���6�x����6R�P����������\�9
��0�n�Q��1���*��_���W#w�����P������*%>��������!���(������x6T��D2r������H���!�(���{�Y�OE�j�r=��������l�t~wb�U��*��]R��:��'����X�D��� ��)\��a��sM��x���p�vG�C�������,	*P��AK����Ao��R�����M�^�cH{��V�A�����S~�m�����zO�^@�?�g�N���V�����������{?v�2Q����o�sw�V���
,���<��o���f�=�+o����{8d���*+Z
����V�W8T�Uw������CLx����t�r�T&��9����������'2���� �0B�=�&��{�����"�����@uR��v���Q�z���h��E��r5�
g6w{����sc�[�E��2�;��?�")�iU�CROl����}<;p{���
��E��AK�H^j���yK�7U�	��d^�&�����
��l�Y6�
�I�q��h�s�G���4����+�Vn��NG������
Y��5)������G:T��OA��Z�+*�x��S��Pg(J��:��#�F@E#����w��/��]�{y�^��F
��Cr����w��/�Ai�2���?x�Rwy�������,~y����Cz��G�)�k�[}plE<o=�!(����am�Qz>��\&t����'�:�����/g��h�8�=����e�>�~��'	_}���~��#����6���kJ�v{^�Xz�d��eX��!M��Z��l����z��AS|����
��a�@���X�>�}c�D���^-�vA�N���:��M�t4�\\u�C|����Hq��������x��e��a�L�N�2��s[O?�mK�9�j7�W;��k��/�+�#	�-��c����V�������=`]�D�+M �a�b��7��@N������C:����o:|��z(�<I|��Ax��
�.U���+B�@Bd�>h��J'��z�@�	Z8\���i������F�W��@��n�XRDR/'MZ7q�n�qj;�����CI��F��z$�����bf >$�����]�$���0�����H��F��U�Z���.p:�|Z���g?�"%��U��,�E�T��F����o'FD�R�013J�C��/_���f�NXO8T��9�HX+��$Z��<��M/g�h|5}���r���.�F�m�!��]�8�Y_G�W����C"���lD@\ZF�$e��X>���������@Ju64@	j%T !BQ��H�0f?'?��KN��"�������%��Q�'$����;j"x}��3���0��);���$��5��|zQ�������������t�B����EY���1���%r�����Ct����_�D������8���t4�!h���B���t�Ay�l��;�c�33`���
�O�P
����pH�����-}Re�>���o�=H��"�S�S(c��c�m�q�$\r��#X�8���v�7���{��u��%�W(�S�E�aD����`H�z�����N�e��`J�o�����X��d����9i�ixFv��R�[�%�1��0�%�B�VF@&w���2u��i�:��?@����$CpRL�
5�Cz��"�����
"V*��d$������p���J7���k�]��t�u����c���g�����'������v2.�� �:�41W���R�-���^�=|j�C#\'I���pPA��U6���I�{���z�yOR/I1(M�DX9s-~�T�5�Z�5�'4c-���`�FV�Y�2��`	BW�
�Ht�!e2qx�c	��R&�P*Z��s���r�r�K=��{A&��y� �^���g5r@��d�����,te4�&C�Wd�T?����"V�*Xb��5��4^j�Xu}d��O�c�e1�����x���U,�o��r�Sfi�-9=��pe�OS��5���<�8C��
��>d�k�b/�0�������1x�_��d6���@�����hE'a��@�Q�?D�b�u��JT�j]��e��x��',���L�(�wx�vl���A�X�p�����Jx+Y?�+��=eh|Nkv����O��1y ��	�sy�Kr�#Vo����&��A�o
\DR��^OV��Yq.�Y�F��j&���!fMFym*"bA��>��Z�H4�G��(�����V�7����C�'��h�1Z�I#r��9$�f�F����������W�g����_�J%�M`��l��oqP�K�J���*���N���6G$A��j�Az��U2F���������D���?���x��������4���p���!	����7�����?)'NHa�${]�C}a���aO�
���m��R��t=�Vm�T*i4s�����SNG�,V����q�$C�	���TNdN�(~@��p���H���}5?��!<��]�i2�f�Dh����PG'$�<�@�����!�������j�t�T�e���� o{�\�DA�
����P.T� �Ty�P��3�����f1�s�����0��R�������u����F*5Gk���0*�PB��=�m��$�=�`H������U�`s��L_�'z�[���{S���NB����B��;��v����FD�'���S���
KGl��C�5�����7�����%�*�zy�nA&F�����$l�~���f�)����O#�l���aq=FgM�*��h$W���,����b��������O����R�����45�����F����R���P��3�����mP�q��te����6K�4�B�gl9���Q8d�gl�H��G<8�����
��V�< �,�0��=��-��.�IE�l���FE$���r��X��������I�<�09	��>`Q��!��
Bj�J��`�nhj��@<���7{��9�q���t��W�c�����������V���'X��a:2i�[D�� ��t����m01����Q���K�>��,*
8 ������1"�uF�J�fq��D�
��>��>������3���k��]�Y� �h%���#����H�jZ�~���N���e���h�#�c�Si����&�	�4�D9u���:Y�tO���A������Y�zU��c3Fib.Uf>��2sE�O�|�X=��FC�9������(y@�G2�x�k~�h	�"2���k������R���
.f��b����Z��������691��T�K%��-Z�	�%>@��S��&�3��������9�v��O�H�U��{0���!�{�e,�h	�P����S��2\V��$�F^��)�MI����UU��[|6q��?4P����6lm����^Zi�,��l��0F��_?{{)����S'@��Pz!����t�2���*�`��(��:��@�[���l�N��@C(��������y� m�rb�_5���(��f,dV*���Zvd;����8g��`�(�����`���X�	�C�bE��3�����$W�n
��E����hZ������� ��������U��A��1���?ck�f	%����igY�t��ViO%	vU��$]�3�MC�b�b�*R��L�)����K�}�J:D&�]\�L���"H8V����T���:m�����z�V"��~R|M��n���/8�|��0\���2�A��a�
��	�i��_��aRs�q��;�R���gbo_�^�N��^���]����Iz>2OY����*�	a����n�����	����xZ!)�K����>6�����
���G�(�q�@\��16�~���z�P�,��Ea_q��K�s��G�X��3�~f�'X��������%�o_��1_�Gk���T��e��c�~�p-�x�,�en�2�Y&�Y.�-�75�������&��R��Q,#@W��/u����%��E/�����GH�,\R��<"k,��U6�@=��q X�W���l�a���(�Jz���
�e`1�	�RP��j���~��a Q�)���^� ����CM?�4y��%�w���/�;yay��#���eW��Ksg�`���P���&'KeE��o��
���6��SNI1�=�Mm:�$R�'J�i�]
�yhk��)>'E�������6���g�o�Q;�=��L�~�p����.��WW,��:,�k�.��tk����W�J��Lw��Z��FA�!�Av\Q�!�:����hJ��E��
n[B����Ia*�&�x�,�=�C8��C�SVh�yG��OD@�e������RW�]���con��D���c����U���c.��s8\���A�|M}H�SJu���{�j{���(�������4!H�6E$d�^q$03���y��K�<'fB&[&�EBaa�,�z��46����m�
��-��U��=�1=%U�����N�9m����=�)��K*�.�
���)��4����6���i�t0N�35�F!h'\q����+����]�g��=nU�s��3�h	�RO���iK1gGu	���sqL�:���:��z�����>c�7��i��
m���8�:S��q��1cQJ������&������Y�������������������-��p0N���Y�^m�Iz�_���8I��%���+i4�
�x>+������y�����5�J�������M�������-i*��V��������8���Xg�Y��n�44�l���6&)+vd�3�X3V�X'~y��k�/N�:qJS�If|S
=�����O,��d�+w7�ir�Kd���V\(�	9y+�A�z���r���GI����r�)fM#��9�v,��p[*���c�`�b��F��T*BE�>�7K��B����S���y�S�Z"�XO�D�$�����^�X�SOlRAw�rV�)�~�SU��T���b��~��"�9O����tE?���A
]����\>6���9���cJ_B	������ty\	IM|	N��G
[:/qta�s��c������*��<�0����g�0�fxi�M���<��v��h��I|f���R�TMLv�Y'��^����.�E�`;E���cQ�g��6F�8�C�qb�(.e.�H��TD8�������C��,��2\U8��C�b�����S�)��I����S�4hl��:�(��rO�
"U�zm�o?�s����3���a��^�U�C��Q;#�f��t.�T,�c�4��ZZBJ$*
r�0��sQ�;z^G���[��7{
�q�<�)r(q�_H�S����e�������w|�/N.����N����|�������'���	������\�����������h�Q���#�$�N�Gc���9,|,fwh��!�Y� e:��U��,s*�N�~V-�%s��|�'uX��2��``����<
����m���p���V!<r���X�w0�\�r������~/k^�G�l�Z������X�.w6�:�P�\h������������m@6\��-�(X'�a�l��G�<�^���X�<jt[�hT����0����Aagh��@��\������o|,3
���O��y�V���zv����<M��>�Z�'�/�?�����e��V�[�|>��� 5�3HC~`��&|`�X(u$;]Z����.�����=�*X�^���/�J��U[�N�]��?�b�@�h]`���s�K���c��i����q����;s�I��Q�J�Q�9��C������@:l���w��p���j��((��+')�1�r��Q4�F�EU��(8�K�����T�>�&��V��R����f������Er<���,u �T�vNg{��T{U(�8-�j��4Z{���>��'���{�j=G&��}Es[��*v��EC��"F5��i�O*���?�6��X��C0��B2����}��{Z������H���i�]��T�~|��`���	yk4)�n����-�O�i1���y��p�	I@�VN�"�)�h`O���fT�,���N����j1�����x�ar���'&���>q��1TM�OaO�1EU����M2��p��u*�}�sD���]�)����)! �X��6����GQ0h�2��Ti! �hZs�v�$�7�bJ���E8��<�5�O����#J~C���!f5�'�S�$���6��`�����O=�p7�p
�u��GGN����v��~��p����Y>�C���6������<�����/N���z��/P�$�,���E7�!�9)�M��P��.^�I�	�z�	�����r�F����������z<��^7X�UtuS�	L������ ���k:��l�U����3�H��\�z{|��E�; �E�G$��� �B��5��hX���Fs�.�C��L,R�(�p�	���^SC#��z�a_��������!R^?=N�q�g�C�,��E��11��s��?�����7o�dl��'�\�5}�(�Y��������8���C�K�`C���Qg����v��w�8^���4��r\��!��T�#w��uT�,�(cf�9�����r w�d�!b�A��GD�3�&]d���"R\���K���l6O�q�f����5�d����XQ�O\finI��U��!w�?�p�B��T�/�%x��ubN���������~7����%�E�{��lQ��=��D��	�����a����Mo(����b�r����W����#!�/��a��m"0�3j��Q�^��#H�o���N4��jFr@3@�~Qx7����bu��������hx��'�$M���`��4��'"r.�dh�[����k�w�hq���������zP������.�\��������h�jo��R��j�YMp�9*���z��3xo����n7�����/����o���4��V+�|����V����!}c��'�>�Sw���_��(~��&_�n����v����N4hzQ3:
�Q�5hD�������y�h<��1���>t���a���'�W��@V���9Z�����`v�l�o��1{�W��ag��1��?n=n6�a�4�/�}P;�q���w�����pSO�g!����u���w?d��t@�����IV�u\q!3��:��j���R6<��y~��l��x���*WBR��K�y<���,��eU�4ZH�P
������\)�,Ev{���p��C)H���,&�#��������h2W��'IN��`�������5�}����pr��2�
���N7�	�q��FWf�����]��w�~�,s�/�,��"��p��|�
f��8��q"<h�]&_���z.��^	����}�f�1��f�^��O���������|��'����A��Gf5���������6��������rkPjm���'���^!
�9b���n�j�dt_��L�7��&�X����t����V�������c�G7^���v����
kL6;GX"��mP���
���<�z��aD�5a�C�s��Z���0��a��D��J}v����z��M;I�RE��o0�]
�q�%*��hc��V�KE��[�5����3�_��t!�NU�����B���+�B)�K�O��~/��J��d:�Z��$�������@G���qW��#"x[�*��f#-�,��z�^��S�b��&S��$�;(��*�H,��f�A{���F�Gm�rx8\��$�c@j(������������^�U#�o����a���C�"�(����D
��&�6-Co0Q�>qO��u�O�U�����L�N�������]�j7��K�:9a��Zvp���9� ��6])V�H����������������8-*)�@%�`���Mz�
J	O��z�P�)D)�[�h\_G�1�$$���d4]��k���u*MA��n�v���mw�z}p������N����Sim���G�_D�D�
7��0�a�%�����B_�����?8t6����<�h�A/2p��"P(OI����K���Q7�r��j��~i��4�^JG�?�V?"��������2�Iy���Y��2f(X�;�/PmO�{�"�e�z;L���'s�MZ���M4��
�������u9�Y�dW���l�n�M���c#Z�7��p�����p6�2�J���{��'��pA�����Zh�j�M����7�go8%������c�YJ��Nvu�c��[��p�n�|0���v2 ���>&V�A���"Tk,I����*Pa�WJ���� �x���X�T��*�T:���|�UD�Jx�9v����Xs��3)*��P�4��$���S��6c��7�C�g�;0ON�=p�C�c��Oe�����2�SXC��U=y���f��J���l���#�H�&�H��/?��
���\�A@�������~�,���GG��:
?��6����Mm1vJ��n��HT���������:(���l�S�A*�Qr��v�$��<��,?lyQ�^o
�N## ��S�N�%Y��%Y��v5IL�������5�����h���mB/����X0X�x!�lQ��\�I��H���z�q�6{�_���cX�=��P�A_��vS��;L�bu[sZ����s��(
���yE�k��/���'#;l59
"X�	|o��.{4�'�V"��j:���H�b*_���Z�w���u�GDz��>�F��b���3?p�O�0?A����C��&C��Bq"	Gd�Y����<Ot:bf4���G�X��f�`B@b_�k`_�Y�����6����YN��Dz�EC����Q���������<}��$X�4�_s���xu���E���fo�p��x�r<4�,���
%��}?h��������{��;���|�y�������S1R4�WF��|�>��\N��v
��k��5��s��g�����l6Z	�O�_���wq��g���}�*��=#o���2�7���0������z�������
�����Xw:��|?[_��7�8x�g��N���i���_��mB1����p�	1RBT��\��q-8[��1T
'�5����%W�&#�>z0_��E����z<Y����S�l�,_��p}Bo�e�;%&f��o�p	���������wG�����L�L�#�}���l��F���Zms�5�����Zj�J������%G��e������oq;yX~�6�\����
��y��U�y�����t��A���lI�T+�>j�������[���o._���.���}:ma���(��;WF��b3�;��rc=;�������^s�����7/z��..{/��_��[[��%'�~����_��%e��,W\��^>`�9�H���s���`c�2�>H'� BH�%��)��E���4����V�� <�#����"e64��Np�D�^z�XO��d�=�+�a��}��2�z��ib��V���D���hbd���<Qo��O��0*-�<�M���n�����b����t)����;�-��P.=~�Ih����;p	S�]r\j)��S��!V����tg+�6^a��
��h�2A�s�t�As���j�&h/ �&�R��]V%��ob���vf��'��� �[
|t���u6Q�E�/��N�j��i8�:�T�UTu��l�:�����6!f�������k��\��Pa���\c���"\�=h�����O��}���T�������R��f���V�+�S��W%$��juC����O�`o8��L H�S6�0>��N`�YX����}�&����8k>I���4��K�����R��
�dO�y�>���*��T�t6I��!�|}W.���R�@o�9�Q_1wk-��6}p�Fvf�8/��w>��[$T�k�)q��!����y�	�s��"`�X��&�!�� ���
��L(��,�A���^�
������w����\N� ������Ue�O����Mh�JP�N��5�j�?���
m�	���#<!Eg���F{x�sn,R�>�Q�Y��S��dvR:y������?Nz���w��8��'��������\G!x�).�*��uU��XS� #t����������&y����A$��p��h��?��}8V����%yT�[�G�8&��d�%��S��~�z�[O�H2���7����o8������.N����^��L�3T1a��
�����y>��F^���49	[��=::|A���=l%�.��T�����EC���#��'4�D���U����c���=�`����^U���-���b��J��n.�G�0��G��oDjP���t0�W��f��A�������''	������Y��������l�?��>�7��h���L�*��.�?\��8y�)?^UV^=��������A=���;�������\�yy~�=�?��?|��*����;9?���@���'��MV��5���yp���AU!����'��'\�H8��/��B����n��q���O�a�#��}�����<���@��gLV�������0����Z0m���stB��"��t�)`������1�hXu��h�����@�;YH��f�K��E�#�Q
rK����>�}
�W��O�9'����Cb����%����S�KuQp�9_�t��x�{r���,=����_���)���y�E���P�=p~}c$�%l��@��mq�H`����[$X_
	@��J�����u��/��@��+
�����^6X��4}��
��ep�b/g����bNp{0�%���I�;�� ���<J�)���CX	�rX�;q*�_�.O�8��������'p5E23��-��~t���~��u��W8���WnW�5�����C��76<�b���9�	y���XJ�!?F�>TA)_4��r�����<x�Rp��V�>PV��I�.T\��6��F���K9S��d�z��_m[��r7���z��7��t����c��
��5j���MV��;�`�u+�l�W�Y|��e��i[��_�����o�r��z��hV�E
l��p:�A�h
���d&�]� �����V�l��f����t�oA���������N��~�y
�=��+��F������{�vs4�#�_�q����u��z��+q����Y<�����f�w�x��.�|����L���
�������f���vrWO'H�v�5sx��Z���W������������o��}�7qU�"����m�A�>���AT��h��"^�I�i�����}�A!X����i�)gU��RP�����I��yh�-��9�!���TfP��5��T4&��$��A�v�����O�%�U���/��g�����t�����j�Y7�Y��=�R�z���v���joA���7��
WCQ"E�wR'�q	R��9.��.ts\�R�d����|��V��"v�GU�].��
���Z��@��������nC�#n^�����k��xB%��c(|S����K:=��Y�dI)h��h!:���A[K6V#'�<�f���\��m�H��Y#q�fV�w�@$/��z4h���^Mpv.�G���b/[�����\K4<��N_���0��qz,���OL}�2`�d0�po� �/��a)�	���FV�^bT>�{*�����>�d���3��H���o$j������i
8��OX��&+���fs��>7��3���i�$�$��
��6	�1�����V5�*���hDL{�����g���X�V��4m��=���ngZ�����xC�KSS�}��~���\]���jO����r�j��j��;�hA+-5�������������@����������o_����h�U��h�q����9��
���+�!,3���*��ez�@8�> ��'ob�����g��gv`�K��/�Y�$���}������U�G�T�q���?��p�������g"l�}q��S��GWQ"Z)�KQ"J'}�D�(���G��F�~G�B��'>���
�g����$:c��V.���kO��RNOYp���������-���	���"��JK�C��g�]*�:��V��bob�C�_���uy����e?�������Di��-������d=�2W�G�1~1�������*��D���W�@��,���L���l�YL0k"���x��ll�^�Md��l�4����d���b>;t� =~)�����������''�����z7�$�����?MS�����s������mg=���8�I�������r3p��$�(�N����Xq�#``*���+�	P}L ���;�ou�i��JE�T9������j-c���
����T���U���|1�9���z82�~H���7j{��>�SR]�c$���k4L������o���r��P�tu1���@���?j��0���mcpJ�1d�S�1���Z@O�8.7�=��_�y����(��UG�*�a�e��jC�o���s�PR������R�Mt������Y������~"w�Se.Dxj������%��&P�{��7��hTlL1B�_�i���t?�Mk��bxk��	�|�d�N��J���
+��'A~��-�i�S��*�����(���Xe��<RAX	�����	�W3�,�H���1��zi��Od{y,C(�!��3BO��Oc
��`P@W.�,��v�Y��@�����C�:�5�"�j]5SY��$��������b�1Sf�36KY�m������3�L��H"�c���PV�,���R��C<�������H���	J�"�R���A��[����Xh��!.�D~�+�����B�9Q��_���M=���1���a��g�`�~��}�����(��-N��!P�}���#��N��ADZ?8�&+�t��d/X|tMG�B&cL�
��P�Z�o�,���y�'\1�\�MK�������k�������	ASV�HK�R$+_��������&���E ?>Aa���6+H�-�=�^�u�Xn���Y���;�����Cm�pw�������ZMw����i�f�3�N�����r ���;:���f0I��rg1�B�	-sb[}A����A�rD�*�Y�}��%r3Sd�b�B
��6s����1w���_�5�*��Nw/�m&�A�R�b�k0D����
��p<!�3�����pnqMWZ��I�AL~�5�PL���U�F��]�zz�"* ����"��'��Cc�o���*����8T�l��JJw����9C:���iw�8�Fs|�V�H������0������1�u�}�v��Di����7��q����hA��d|n�d<�M,���U#��@&��lH;p�9,�%s�G�_r}d��<d-2��EgC��x��=���w��McK�L �p�Q�o
�V���
�g>"�g��Q|Zu(�Z�)FN��A�C�\Y�W��.��e��{���������[jO*L�������P�0��-O��������.��1��r>�������\�����c/%
�W��u�^�r�.��$�0�\�Y���� [��e��e���[
���HZM���-���o%<YT����
��'��-}X$�+���2��j��vJ#���*�V��#��f���EGMvk��c�tW��vZ9T������@psL�P@0���?"n�,�5���-mSv���3�6��$s(�,E?�,�6�����j��,��!����[y���l�t{��n6�9I8���J���T`s����p7�x�bG9�&fc�k�w��&�)kSG��&~&��a�����X�_����-Q=�1����&EKbQ�p0����������|)����|J�E�R"�hs��A��D|)��� wU�$	��/�b�)	)�Ir�/U�I�A
��Yp���M��7�01�|]��lf�n�����@�����m�O$q�
�\}/C�����Jf�k�;�kC�;�k���nNz������F��["m�������8�H��i��3N�:�8	�K���V~8���3�p���a���Gx	�~6t��}o��^�Gs��$pL����
�1���0>4���]o��Fv������U�����l�~�n����-t�Zw����k�p���6��1|jDO����X��7�q�Uv�S��F���
�YF�D5��VL9�XG���_�%.o�YC�X���b��_~��K�T2�.8�)�q�� �W�����_�6�����}����0����7���7���v{8�#?j���Z��p����~�������q�s�����_�]*���R7]��T�k8#\-nz�p�J�dv��<�+gF�����
���_�����_�Q7���kyw��Kd�����v4�7� ��QS������E�&������4�UL�o�B%������
,{%,{m{�>nxJ�py�/g@K�����`r�Wr#���������m?�z]%�]mna.�y"z���	�K86Bbz	���5�yj-�KkyOC�'��V�i�aJ>��!�N�����J��X��%2�w[�;.�QK�J����%�O���*V��OA%�#`%�
�q���<`��<Z�U i��G;��T��pG�����=9??y�K��B����f�cX��U=�$���r5dbW���pzC:���1A/�m�c���Y��td�"*��Zo8�%�]�S�O�2�P_T*��W��wI�{ ��-� �'�2�eT����'Y*�:
��2!�/8!U�`�
�JD�^�r�k�(,�q�����������yWp|+�����P�����Jx��dm�t��@�I0���5r��������cA���e]D�8���d�|����O�p�5-Z���q�)��ST�;�_��_�|L�3O���rd���LR5t?�}���������jC���� �|�������3Z�O�Bcr�#�J��g*�������Fop�I�ofS����9��"Z�C%O���h4�#(fH=J\cY�����q�1Z����j6��� ��,%�d����������>�Ic�g$�I�x�-X]6�XOo�
q�n�S���t�%sn���1.Py���U`�������q@��A~��SH&HZ����	���hI-�Oh���<��V�]a��>��p��.v�O��Z<�b����^�,��v��&=v<�E�x�8.��l�������T���m�0�{��HX����%i*/�P��)��|6��(�����_4o��\���R����.��x���A��iD��������)���U<x��*��*���7�����8d�������m�#���Kb�+P�^D�����,�<_���M��oN/O�_�R1�,
-n^��-=��9� S��"B1��Y���������B8^?�'�������4�������I=!�pNi����6���(���e�L�����Y��T"@�}���E�Hf��l��t���-�_����6S�[���.#�R:��fy�3�3Km��m��l(^^��q��V,���+Y��|2��J�
�E�7-l+�&���2m�<)O�T>��%���>��6�!��e�(��6��'&���7��nB��G�4�b.9�&s6�`�����1��(��PQ6��fA(������(W��(��.#��62��\c�����m���)L�\�#
�����;#������0��:u�Q.m�K��SD���g�������|���I`��@���z_���{[Q�L����&��#7X�X�V���5���"{���.]`7��nxl��I�w���&Q�,qX�*a�5e%�0�����b��aQ��t����^,e�8t�t���%�p7����^���!��V�������,yDI~_E>i�/��{?+������6���������������!v�.luPn��l���f�a	bJ���8���|��(f�L1Y�b��b
%�Id��
E��uG����M&U����!�n.�t~#����=���hbU�M*pm�%�a���v�F��vR�l�s��U"��iN�)y���y��`�-	����1�aGH��ogE8���P:�Fc%�x�`��U(Jb^�E�G~pg
�s��so�b>��J�U^3K�6����*��8�B��k�
�U�sBBEr;��Z;�`3�8�'�[��z����i��B�#�����_���������y����Is��o 2s
_�8���7�����Py�/��������a	���Xg����q3/���n����G�} ���>K�@�FJ�qIsW��Q������������j�Ym�^����W�#�aB���
�Z���d�O�t ���i�������?��y~+��f7u�ak��V�ox}�����f��w�<���[-G���� n271����iqSq��>o7�����Qs����D�5��i��"��������% D����H������2w�`*��y0�����f���'��>�CE�~�x����-������V��Qg������G���<vX4b����o����z=���j��2���9jS
���Z�QU>_�DT��TUr��%_�V|����z��h�Z���*�%�\�|	�8n���z�.h]��>y~�V^=����������h�Xy�f��>�<9?~�*������>�������	[��J��QJ�����Ka���H=�H���0���_��J���'���>����C�]���	�i�\|�xB.��������=>�����Hy~���I%���Io�9�q���~#p�q��+�������1(��c�4������Y�o�Ee��>x�=���R����8���M��7���,�����^��q��?�7(����i�x�[O������)0D2F��Z��g�'����E������o������\�H+��|���������9Pvv�^�`4����K�@��
�����#�����#)0���P|1���P���������6�n��d���o�?�B�LF{
�:R �x<������/�)C�DR���x	���I84�P�J���A�6�p���������]\^p��<�np�p��\_W��3_���;�����C�S�"�������y���W���-vv��/
 �	����N�����j��������
g:8�3����X���q���TxYnK�`�xb���&�->�=c�+-+j���XCK��)�u��@�p�z9�p�1gJ|y���~��<~<]_s�9��������Bx�R���f`��2��8�	sl���I������,Fy!-�1d
�����C�C�w4���
=(>� 1�`GC�C�p}����B��`j(l3�pbh�J�H�u;��+#�)�������q�����<f���XH��<[7�����)�g��JE����'i~Y �6 �v ����@����
����.��6�R@�x��o�*������-�/v*�� ������o�s��d�_���'<��O��_<��'<	��@��'M���=��#��#��#��#��#��#��#��#��#�]#�x-C��*h��4SI+$A���B!�^��p�/q�6���)�J������+lW����(���D8��Yn5�J�����T;�9���n�P�t]�v)F�]	����G���~�f�{���}������D|�27����,9����lH8�����v���3��Z<����e��Mo��gC�����o�xA��Y��[������	��2�!a�x��f
?-�������X����������e�^�F����K>�Y���Y��7��� �y����7���
��Wt�^�1zE���_t�~�1�E��cPt�A�1E�c���E��,:�f�1V�8t]��$
���|�0����x��_����8�m����d������z���I�����y�l%<7F�v+��z�9jw�"��~2�5-1�T�S���M������S��<�H0���=U�_����p:F�}#��|�U���A�����"���I���8`���Hk������S�;h �J�-��95����E�Cs#�'eP��a�!�.��� ���^"P�^@mzM���%\�*�F3������e��84>����,�)qr_4M'^d7�@sy���nS��dB���|>����\\��������'�$����hf��K�L��~���k���G��~����������So�d�k���O/z���w~��"��i�����e���B��u���|�),�\�+���OT�c������K����S1�
�ex�@+�G��x��������g~����#h
@
�MM
<����.�Q*z"�B��b*��b�9=�����Yd�m�}�S�oPSTb�!����RI_���>_`*v)��C������)����
�_��n����`����2�_]D�e��6���D��2@x$_���q?�V��c8Y_T���i��T����`eQ�Q�����)�V�������m�{y'��w�r9���EFf��T7��������5_^����!�yHM&�n���2��PsT 0��i������Ct���ddV�6���}B�C�^*{f����U:�G�M������|�!U�*��F2T�s��$�'*��mtq�v��[/?f�3�����"�U�R�8$3�`��^!ym��R8��Lr������5;
��8�
�v7�J'���9����1_�?���p5C�K%CJ����ZeJ"�Y���f�
�i�?d���A+���D����k.��QZqLkl�\�����z��;L�'��TlH�j�l��{z-)����i<�� ���A��j=G9E��V��Q���#����q�;�GRKaj�E����)0V��d�yc4���B�S�	<�5���u�2��J��zo���Kd�p,+���a],�j��K�"!��)��E(~G���?��k�PM�{������%��O���y�4M
���VN���E���Z~@=�����.�����|w���l�\�1#�p���S�3%���u�F�H�""���K
	W�����g(���}G$�}�n)��vC���*C�K����P�6�N��o4�z=�{
�(k{_�&����	5����fu�!� |=X����J..�xq�R"1:�!����fP`M�>����w������j�k�rS����[��Xg8���aD�Im����h��s���V��p8���z=lvGG����Z{�XYk{,
��:����������G�T����o^0�����5����Ty��Y�R06N��w�v��q;�������G����������8�������C��_[�������������rj�I22K��-��h6!W�:�����y��r��������z-p<���j��
���`S�
�V_V,�z8�(IFLg�����6�K�|3�i
A-=��|�hE��u(�����R<���k��xH�&Y����O�(��d<9������\��5C�����q�����:C{
���a�+��R�C�3������'���8NQ������UZ�����/&7�����2J�}����/������#�T���k{	x�qJ������h��%��d����s���Qa��	�����(��������O�JE��%�������#>�m�n��v��>��~�>�;���s���}�k���'ab5�mnt=O��=�
�����9a������Z�y+�z���`��>�=�H�l�������e`�`	���5���f����/R���
L��t	;[���AS#f��&���'A�$����t�~6�p j�v�K��6
\�UzO:�k�x����pW.N�	s�I���a:/�m�L�'����{��|�;y�)o]D+���f�J�������g?�VD�s�{z=���I�o{1Z>9�����C��0���C��/��q���{%�(/N/.O�<�Lo��7,[���������n�7S��X����8�0o}d3��8�f�N�7����O�B�I��d1����v���e�����y�9k�M��y����@�I������hF�`0������Q;-q:;�%Mg*��(��'u��[y/�_�,�+��~��J�=�����I�r=�E�vK+�W{f�d��<�0��C����������"JL���f^�''�4�����{wf���k�����5�958���Ak�g���g�g���~`��[�V�>��]\��y8y�F�=:�����(8�Co�?��G�h���(�7�(�����y�h;�?w!�3�M<�^���nj�Z�}J�	�?���r�e�U�h����������|X�?�$���m�Pe#����p=���^�����-1���-���l�D�
c�*�3v:�9E�]���Z��n'S-{�f����yf�e!cb�{!m�Y�b�[������I�L�}����c��]�'5��|�G�����z2�d����Qm�C�����?�Y���'�{��e���H�>������sT'x�cy�*���o�R���d������:?B�������9�,gU�r\�%m��u�9��B�L�=�9� &��Gxh�@a��4��������y}zqYd���=��3��dQe���__�2g
�����o[�v��/�X �������������������%�]P����9��&vg���?::`c{:�M�.�cQ���xk,]��m�2�:]�*����*��Z�8�'�}�9�@?���~2�QeAn���	�v��I���M�3�D�t�%�j"�bv2��l(Z�����F�[�)�����~��"�1�QbM���h��������h���
�m�H��%6��P��e�a�,2{ $e�7�ffe1���/��"��#��[Z�=�I�a�=z��h���\~����|-�^3�=L�������^�8}}�x�Sh��>����q6�8+��Lq��:�Ds����_-	-���@rS�9��3OzqF���z��>�9�p(�\h�8�����U����v7f*��J�%�=�<�g���Md��rI�E��S����L�3�����e��`	�J��e2�VS1 �84�^qhx14����bhx�������C�S��x��V�����z���&?�M��S��(1=��
��&v����6`B�+i�|]����.��o�������_�7�Z`~-(��`c����-�9-"��W��1�7	��T�<�0kPu��[��A���o�����f���hsO�~�d�������0X���-*���2,��J|Df��eS-�+�V�['��N"v�"��r�������M�~;��rs��!o�I��&s��i�����7wLx������������}��M������������k����AK�A�d�v
��C�X�9�1�La�-<�d����`So`��|�9�M&�W�bSS������n2���9<�/cD�3�q	Cqq�pq�pq#pq�oqcoqoq�nqCnq�mq�mZe}3�\4)&z�s
�U��������+�����e����=�H���������F5��KEl(��e1��M�)bm)4.�Y���lzF�]F��)�	i^q�������?2��|��`���J��;q.9j����r���t�Q�$����
����Mcd��H�@���~��m5K-����a����D�N\�&���280����������:�
m2*���9��$��!e��2R�s��B�2� P��S��]D[�/�����>����dc������\H��*^RT��$��!���eT�{�v.���
[�DA���q�����w���|�o,>�����i�sz�{Y9
3�s8�[��2
f��p�����t��������V�&�����tWV�V��LX��������3#�(���W�O�:�����*�q��������<4��Q"$�MA�Y�Y7�en����W�Ll����ic���1�����)��M�Y�f\�9m������_Q�O���>��a�"(Afd����t� 
����.���w��{�C��Z�6�Xy����'���6���C�%�8�-`��F~��S�_Y!��>nF��a\`n��wK�>k��#�4���4/6���F��q��Q;��.�4�{>����*-�N��<1&5-Qjk�!�����Q�
��=�������o��o�s;d�j�s���B.p�.u�/�,n��X�si���c��ywl�F?$���38�+��Q���C��[��(�g/���2����!���kl���r�M������8���)�/���WI^�zi�fl����L������fe��@�,[F_q��/\!F���L�6�],_��ol��6�
�����{�x�I�	l ������B`��8��)�E�v�Z�t<�����C��$��,�z�wu�U�i����������W�]>z���kv�����"��|1�\�����~a�!�I��o���+EB�x����`�������#��
%3�j���QBN����c�����|"��u��44@\����9BT1	j�G������<�S��A�rNd7M�g�S$h���*���O-]6���6���L��SX�H�BSH�|A Ss$�����q���,Au>�z_�}���|�����9|y`E^���N���!�1�5��n��M��IV7��&X���w"����y��ep.�~��+�����k;�e��u���E��_���RLe#v��M��1h�-���S�Gl��7��7����anZ��oH��7�����;�@�{ �9����
�����<���������I+T���2�S�>h���H(m�)�&��*Mh����JZN��N�"�	�^�)K���9�M���ViB���9�
����h���~.��|��5<������j�zmo���}������2��\�L�J3��q����QUHaq(q.����ol�v%�+�dQ�rb�P�1
���a-���+�j��i��A�Aw�!�����l�7����VU�%C�����LW`uF�0)��t?{����ov����A�qr�����*������+��,r[��<Z�'�'[t�M����:R#@/���+�3�]�e�7�r�)�����f?/t6/��B�D�K��"(����[�N�����z����	�Heqt����Y�����=�L�I�Ll����v\:����������s�V�[\�������j��8����]�s�����6���N�:���������Ol��g�tm���o�����
��J�S�E�r��������`Nz;�(#0��[�!��F�CbH��,[���)����
����6q�����c�����v�c�p���Q��������J`�H����~����hS�@�IZn{���V���
!���p�:W6������G��v%�D<v����*��.T�NecO"0��2����������2w��^�.%�����y�9I�umAWD�������8�R�/es�
 ������_���������-��������+�aCK�H-]	�$��}z������3���muv����,�BN-�z���$v�������r[�zw�?Q:2n�Cf�=b~ZP�>=z\�;/
������_v&�&3�I���L��L��3	6���}�]��~q�_���]��
g�Z���zE{������/�croPt4���(���-�q��O3�����3��F�_^%��x��	�wHV��3�/������Y6�|��l��q�AO���e(�2N�0�3a�o�;eE��R�02alwE�/E�������L.@�/���a-��+��/���I3��W��_����K%��	�$�xs��/I��;bb�����2�d�������%�jn�^�"��$X�����q����(�:�R}{���K����;����z���+�lm}�4'.�����#�.��5^��b
9JNn/G����>�[E��s;9����\�(N�G6������m���9�3���b�DzK``���@�yh�����{���B�^}�k��k�1������������1���'��f���pQ��P>���#$6>
�+K�b���w*��r5�V���Z���%��R�mI�[�_�m��C��-~�2��39��*.��`�UnwT�����	E��c���c q�E!�@�o��NJ��tp{�v$�o+@�:@@W�����%��}H}�������������N�m7dF���p���}������	?�~�'��O$
����R?.��3���������3���h��9[��$hb��?�8���?e\o
������v48b���+@��&��t��eG	� ���@���D.��>���Lg�$���_PL9P� 0n]��-H���U�������2n��r�k��D.-������q���:��d���p_���I���h�V�F�1�K6Z/�.��<������/�����j�N��<�B�1�`H������������~����x�C<)C
|I
<E
|�x
�h�qH�o�N���M�:���Y��m���,�������fa�d�_�
���f�_���#���U'oDZ��l�����_��S`h�~��j>_�R��C����4�����/f/�?��*h8�D���Z��v$���2M�u��7��>����7�=��+O����x�B5���(�[�%���=�~���V�[F���7�q��7i�����������t��\��yh��@�[�����o~�[z�T�[�9"v����$�Iy
������e�2Kte��"����`qW%$@��~�)�
P���_��
�j���X���G��n��.�0!VTxP����c��9���W�2���0��$�N��c��������r u~7�CN��v���,�YK����g�4�Z��=��s�<�)l��A�%ns�m��{�Z��=��s��E�t���f������j'U�Y[V�H.�+��M��Y
��=zE{�����1���YU�,�Kd��,0�];-�	Fe���	�93����\7.���[���u�
�q�]�D�����,�F&�,L)wx�j���{���_�����,���sv�_+7g�����N�0{W�\%%��P���s��l%t����ES�x�^�^+fAoZk�U6��>@�H��T;9�z����{����L���F��P��Mf��`nb�6��X��k�<��W>�8����� ~�+�g#�����!�D�{;���}qAlgv�r���`�(����r��]#p	�N��v����<]� ����l,��_�B�_���f!�}/��f�����f!�,B��w�����5*��}o��[kyU�F�'��[�����xP�gY�XO�W�_=p���g6��4��������+>Z.�V�r��o�E�\>�>���*>�_�������K5�O��gv�F��AyA��v������������f5�kU��C�G���M�uk��zQ�go�rV�E�����>�g���)�������O?�� �E�A����fDG.���{~�� ��
IYP�~S����`��L���b����I�r&@,VC�f9��?)��*�;+L����Q��@�">l��
�������*/�=�_��A��b|
}~�X�(2���"a�'|f�rI���ew��L�"��k��SSC�x`Z+�U��%��W��� �0!6o��MP�ng����1��
(9<M�7���^��+��[��Up�}��*��}�f#dVK������|�S3����xX�{�|9wz9_Cb��l��=���7��r��G��DX�G��:~1�C�hA��b���D
t��rJ��%��9|�~Y����!�I30�@��q������*���Z	�����D%]�s(���������B$?
�m.������|�`�shu�6�n����r��5o4PS?����*\[�+��q��B��}z-"�M��_���
��%���MYt=_�d�+�kS@�(��aBB�>"���������Xu����Id��R��&������V7l7��M0��C�K7$ks�:�~^y�%�!o�&�k>��d~��q���
=�����-��U4ZO�.�F�:W*{���
����.���Ogx�w���/�~{�x�F�)G�f*���9b��a����w@�~�"�Q+��yu�8N���e���)O���?����NoH�\�����x�z_�{���r���0KM��v;����`���~�$;q`H��G���8~��v���'r���o��nz��`���\�-r �����c��6G[��i����|�g�H�@-H\��v���G?FA�m'(���6�
�����&1
Z�����Wq��=pL3-�QfEYR��	ZKU�;����(\OV@T��%���L �g���,�_��j|�x��U��1�����K�U^����T�.�9n2��@Y��%��Z�6!�mf�iw��e��PR��xe�M�_K�����������#���|<J���/��b�D�X��,���f�c7t�/]p�7����l�[� sI������#��P0z|bQ�7>�Ha�������/������[x8�@=a[A;Q�d�����}�_�s�������\�������H���O�|��f�!���-`.
,��pc�4���i.��U
�D�aQ����J��������o���{�Y{5�tm%��0��d�u���I�o�t�>��	iz3�2rp�j��"�R�sl�b����xd��{��!=���{�F��o��������V�\%�
���M����b6���PZ�����t,2�h���	"��|� B��s��V=�#�q-n��U�����9�oo�l)�S��tW K���-���Q=����q�wB��Z�Z��
@�wB��_Z6��+�IFz X����P��C����#���S<����1����dr
>�����^LKLm?n�I'Y�f��P@oVP��>������t?�I������yw�M�e(��U��?��
YO���(�K�v?���!�������]���s�4����4�.��5{|����G�%iz�������O_}w���������K����0���z�Q�
<�N�Z��'
D��!���O�$:(�L�R����+|��z�HS��-�B(T�a<F
�}��A�`��P�4�".�����'w�-�]a����e���w	�x�w�}�r~i�
%����N�7�Gw�;�b�@ ���o��2l1�z���0����zb�'��h�DB��W���I����mo���xE��$�I��C��kqeoB�L�8��)���@rSZ��W��c�-K+���V��[lQ��Z���&PY���r��
o�oz�&e�vj���H�R�Ex�
��Nj�V�R��lW�@�'��WYf���?�'��s��Tq���mms�(���m��"W	;m�i����������|(:vW��?:�Le?����;�c��b�����P�OSJV����?i����JX����}-��Z����
 ��JJ��n�������;���W�ow=}
��p��/��"�LQ�����r_�����E����yqWz)\�����z7_�J3���gv����p�<��ud�1#�0�M�P0�R�}���?���z��n�//���W��0���|���C��<��u�K~��9���
�(����l���W���������uw������>�A�_�m�\�(�b[j�Y���7]_���;��=nY�)����v�	�^����Q��}�^��n�ZA�����t�oX�.�^�)2����c8u��{��^/�k���>�c�A��9���i�A�[C/6����w:�0l��9�������F�1�j�z���c�W�{�>�|��	d����h�����a�>�]?��Y�c�r����������{������F�������'���=�|�k>:�7�d1n
��nV�����K}F#��b6[���M����}7��\�P���D����>E�S8��s�8�L1B�G�bH�C�?���`�W�����!
��%'�3�n��Gr�~
y	`��5E&+;�r
���Y,� 9�p��\tYL#�R"�4x�|���>b��������~8�M�� d��n�G������g,?1�)���;xG�qbz
� �r-�YT9;��a4�����~Z���z�P3�X��t;�z}���������R\nSL��aJ
��I!_�������r�
d�e���{�VU�_�+Ra8K]}�����a��~�"8��al� �8��f�[����a�y��.��5Xq�{'��>��K�^�=H�R���c:�a�
�t�`��z^��i���=@Fm�`y��O*-�Z�RQy�����0� �G����x�g���ue�7��Y_�v���u�n����#J��	��{����X#��j�^a����,=����*���W��
������J~�����������
���@Z���zK*��h���� Z����l�#����H��#�1����
b�H��6��F�������pND�S8����8���S6���a���U�m�%26��m���7�>�/���!�!�1�h\���=��^�������j�MB�R��<�&��d�
���N>�9Sy����0�^O*���gx����������&������$A!h����-G����4�f�f��6�����	Y�����^��6&�7��V&\���}$|?W�d����F��������P�A��!�_���CH^��?|�v�|�X�o�q�B������c��������|��"�j�=":�E���QGvx\��Y���S��Q8����k�tBC
4~+oT��m@@=z�c/"NW��SN#4F�W�X:[.��PlX�Z?��>_�'�5��� � j7��Q��i�H�C���a#����l��;�)g�=��q��DJ�1�.�B[�,������I4
�p ��&@�@\��9���q >����}�"a����E�����r�_���u�_���u�_���u�_���u�_���u�_���u�_���u�_���u�_����+��?x��H

#35

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Jeevan Chalke (#34)

Re: Partition-wise aggregation/grouping

On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

1. Added separate patch for costing Append node as discussed up-front in the
patch-set.
2. Since we now cost Append node, we don't need
partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main
implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

With 0006 applied, cost_merge_append() is now a little bit confused:

/*
* Also charge a small amount (arbitrarily set equal to operator cost) per
* extracted tuple. We don't charge cpu_tuple_cost because a MergeAppend
* node doesn't do qual-checking or projection, so it has less overhead
* than most plan nodes.
*/
run_cost += cpu_operator_cost * tuples;

/* Add MergeAppend node overhead like we do it for the Append node */
run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;

The first comment says that we don't add cpu_tuple_cost, and the
second one then adds half of it anyway.

I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR,
because as you say it's used twice, but I don't think that should be
exposed in cost.h; I'd make it private to costsize.c and rename it to
something like APPEND_CPU_COST_MULTIPLIER. The word DEFAULT, in
particular, seems useless to me, since there's no provision for it to
be overridden by a different value.

What testing, if any, can we think about doing with this plan to make
sure it doesn't regress things? For example, if we do a TPC-H run
with partitioned tables and partition-wise join enabled, will any
plans change with this patch? Do they get faster or not? Anyone have
other ideas for what to test?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Robert Haas (#35)

Re: Partition-wise aggregation/grouping

On Sat, Oct 28, 2017 at 3:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

1. Added separate patch for costing Append node as discussed up-front in

the

patch-set.
2. Since we now cost Append node, we don't need
partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main
implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

With 0006 applied, cost_merge_append() is now a little bit confused:

/*
* Also charge a small amount (arbitrarily set equal to operator cost)
per
* extracted tuple. We don't charge cpu_tuple_cost because a
MergeAppend
* node doesn't do qual-checking or projection, so it has less overhead
* than most plan nodes.
*/
run_cost += cpu_operator_cost * tuples;

/* Add MergeAppend node overhead like we do it for the Append node */
run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;

The first comment says that we don't add cpu_tuple_cost, and the
second one then adds half of it anyway.

Yep.
But as David reported earlier, if we remove the first part i.e. adding
cpu_operator_cost per tuple, Merge Append will be preferred over an Append
node unlike before. And thus, I thought of better having both, but no so
sure. Should we remove that part altogether, or add both in a single
statement with updated comments?

I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR,
because as you say it's used twice, but I don't think that should be
exposed in cost.h; I'd make it private to costsize.c and rename it to
something like APPEND_CPU_COST_MULTIPLIER. The word DEFAULT, in
particular, seems useless to me, since there's no provision for it to
be overridden by a different value.

Agree. Will make that change.

What testing, if any, can we think about doing with this plan to make
sure it doesn't regress things? For example, if we do a TPC-H run
with partitioned tables and partition-wise join enabled, will any
plans change with this patch?

I have tried doing this on my local developer machine. For 1GB database
size (tpc-h scale factor 1), I see no plan change with and without this
patch.

I have tried with scale factor 10, but query is not executing well due to
space and memory constraints. Can someone try out that?

Do they get faster or not? Anyone have
other ideas for what to test?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#37

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Jeevan Chalke (#36)

Re: Partition-wise aggregation/grouping

On Wed, Nov 1, 2017 at 6:20 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Yep.
But as David reported earlier, if we remove the first part i.e. adding
cpu_operator_cost per tuple, Merge Append will be preferred over an Append
node unlike before. And thus, I thought of better having both, but no so
sure. Should we remove that part altogether, or add both in a single
statement with updated comments?

I was only suggesting that you update the comments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 8 years ago

In reply to: Jeevan Chalke (#34)

Re: Partition-wise aggregation/grouping

On 10/27/2017 02:01 PM, Jeevan Chalke wrote:

Hi,

Attached new patch-set here. Changes include:

1. Added separate patch for costing Append node as discussed up-front in the
patch-set.
2. Since we now cost Append node, we don't need partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

Thanks

I applied partition-wise-agg-v6.tar.gz patch to the master and use shard.sh example from /messages/by-id/14577.1509723225@localhost
Plan for count(*) is the following:

shard=# explain select count(*) from orders;
QUERY PLAN
---------------------------------------------------------------------------------------
Finalize Aggregate (cost=100415.29..100415.30 rows=1 width=8)
-> Append (cost=50207.63..100415.29 rows=2 width=8)
-> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8)
-> Foreign Scan on orders_0 (cost=101.00..50195.13 rows=5000 width=0)
-> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8)
-> Foreign Scan on orders_1 (cost=101.00..50195.13 rows=5000 width=0)

We really calculate partial aggregate for each partition, but to do we still have to fetch all data from remote host.
So for foreign partitions such plans is absolutely inefficient.
Amy be it should be combined with some other patch?
For example, with agg_pushdown_v4.tgz patch /messages/by-id/14577.1509723225@localhost ?
But it is not applied after partition-wise-agg-v6.tar.gz patch.
Also postgres_fdw in 11dev is able to push down aggregates without agg_pushdown_v4.tgz patch.

In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch
there is the following check:

  /* Partial aggregates are not supported. */
+       if (extra->isPartial)
+           return;

If we just comment this line then produced plan will be the following:

shard=# explain select sum(product_id) from orders;
QUERY PLAN
----------------------------------------------------------------
Finalize Aggregate (cost=308.41..308.42 rows=1 width=8)
-> Append (cost=144.18..308.41 rows=2 width=8)
-> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
Relations: Aggregate on (public.orders_0 orders)
-> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
Relations: Aggregate on (public.orders_1 orders)
(6 rows)

And it is actually desired plan!
Obviously such approach will not always work. FDW really doesn't support partial aggregates now.
But for most frequently used aggregates: sum, min, max, count aggtype==aggtranstype and there is no difference
between partial and normal aggregate calculation.
So instead of (extra->isPartial) condition we can add more complex check which will traverse pathtarget expressions and
check if it can be evaluated in this way. Or... extend FDW API to support partial aggregation.

But even the last plan is not ideal: it will calculate predicates at each remote node sequentially.
There is parallel append patch:
/messages/by-id/CAJ3gD9ctEcrVUmpY6fq_JUB6WDKGXAGd70EY68jVFA4kxMbKeQ@mail.gmail.com
but ... FDW doesn't support parallel scan, so parallel append can not be applied in this case.
And we actually do not need parallel append with all its dynamic workers here.
We just need to start commands at all remote servers and only after it fetch results (which can be done sequentially).

I am investigating problem of efficient execution of OLAP queries on sharded tables (tables with remote partitions).
After reading all this threads and corresponding patches, it seems to me
that we already have most of parts of the puzzle, what we need is to put them on right places and may be add missed ones.
I wonder if somebody is busy with it and can I somehow help here?

Also I am not quite sure about the best approach with parallel execution of distributed query at all nodes.
Should we make postgres_fdw parallel safe and use parallel append? How difficult it will be?
Or in addition to parallel append we should also have "asynchronous append" which will be able to initiate execution at all nodes?
It seems to be close to merge append, because it should simultaneously traverse all cursors.

Looks like second approach is easier for implementation. But in case of sharded table, distributed query may need to traverse both remote
and local shards and this approach doesn't allow to processed several local shards in parallel.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 8 years ago

In reply to: Konstantin Knizhnik (#38)

1 attachment(s)

Re: Partition-wise aggregation/grouping

On 11.11.2017 23:29, Konstantin Knizhnik wrote:

On 10/27/2017 02:01 PM, Jeevan Chalke wrote:

Hi,

Attached new patch-set here. Changes include:

1. Added separate patch for costing Append node as discussed up-front
in the
patch-set.
2. Since we now cost Append node, we don't need
partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main
implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

Thanks

I applied partition-wise-agg-v6.tar.gz patch to the master and use
shard.sh example from
/messages/by-id/14577.1509723225@localhost
Plan for count(*) is the following:

shard=# explain select count(*) from orders;
                                      QUERY PLAN
---------------------------------------------------------------------------------------

Finalize Aggregate (cost=100415.29..100415.30 rows=1 width=8)
   -> Append (cost=50207.63..100415.29 rows=2 width=8)
         -> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8)
               -> Foreign Scan on orders_0 (cost=101.00..50195.13
rows=5000 width=0)
         -> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8)
               -> Foreign Scan on orders_1 (cost=101.00..50195.13
rows=5000 width=0)

We really calculate partial aggregate for each partition, but to do we
still have to fetch all data from remote host.
So for foreign partitions such plans is absolutely inefficient.
Amy be it should be combined with some other patch?
For example, with agg_pushdown_v4.tgz patch
/messages/by-id/14577.1509723225@localhost ?
But it is not applied after partition-wise-agg-v6.tar.gz patch.
Also postgres_fdw in 11dev is able to push down aggregates without
agg_pushdown_v4.tgz patch.

In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch
there is the following check:
 /* Partial aggregates are not supported. */
+       if (extra->isPartial)
+           return;
If we just comment this line then produced plan will be the following:

shard=# explain select sum(product_id) from orders;
                           QUERY PLAN
----------------------------------------------------------------
Finalize Aggregate (cost=308.41..308.42 rows=1 width=8)
   -> Append (cost=144.18..308.41 rows=2 width=8)
         -> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
               Relations: Aggregate on (public.orders_0 orders)
         -> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
               Relations: Aggregate on (public.orders_1 orders)
(6 rows)

And it is actually desired plan!
Obviously such approach will not always work. FDW really doesn't
support partial aggregates now.
But for most frequently used aggregates: sum, min, max, count
aggtype==aggtranstype and there is no difference
between partial and normal aggregate calculation.
So instead of (extra->isPartial) condition we can add more complex
check which will traverse pathtarget expressions and
check if it can be evaluated in this way. Or... extend FDW API to
support partial aggregation.

But even the last plan is not ideal: it will calculate predicates at
each remote node sequentially.
There is parallel append patch:
/messages/by-id/CAJ3gD9ctEcrVUmpY6fq_JUB6WDKGXAGd70EY68jVFA4kxMbKeQ@mail.gmail.com

but ... FDW doesn't support parallel scan, so parallel append can not
be applied in this case.
And we actually do not need parallel append with all its dynamic
workers here.
We just need to start commands at all remote servers and only after it
fetch results (which can be done sequentially).

I am investigating problem of efficient execution of OLAP queries on
sharded tables (tables with remote partitions).
After reading all this threads and corresponding patches, it seems to me
that we already have most of parts of the puzzle, what we need is to
put them on right places and may be add missed ones.
I wonder if somebody is busy with it and can I somehow help here?

Also I am not quite sure about the best approach with parallel
execution of distributed query at all nodes.
Should we make postgres_fdw parallel safe and use parallel append? How
difficult it will be?
Or in addition to parallel append we should also have "asynchronous
append" which will be able to initiate execution at all nodes?
It seems to be close to merge append, because it should simultaneously
traverse all cursors.

Looks like second approach is easier for implementation. But in case
of sharded table, distributed query may need to traverse both remote
and local shards and this approach doesn't allow to processed several
local shards in parallel.

I attach small patch for postgres_fdw.c which allows concurrent
execution of aggregates by all remote servers (when them are accessed
through postgres_fdw).
I have added "postgres_fdw.use_prefetch" GUC to enable/disable
prefetching data in postgres_fdw.
This patch should be applied after of partition-wise-agg-v6.tar.gz patch.
With shard example and the following two GUCs set:

shard=# set postgres_fdw.use_prefetch=on;
shard=# set enable_partition_wise_agg=on;
shard=# select sum(product_id) from orders;
sum
---------
9965891
(1 row)

shard=# explain select sum(product_id) from orders;
                           QUERY PLAN
----------------------------------------------------------------
Finalize Aggregate (cost=308.41..308.42 rows=1 width=8)
   -> Append (cost=144.18..308.41 rows=2 width=8)
         -> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
               Relations: Aggregate on (public.orders_0 orders)
         -> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
               Relations: Aggregate on (public.orders_1 orders)
(6 rows)

sum aggregate is calculated in parallel by both servers.

I have not tested it much, it is just prove of concept.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

fdw_prefetch.patchtext/x-patch; name=fdw_prefetch.patchDownload

270a271,272
> void _PG_init(void);
> 
372a375
> static void prefetch_more_data(ForeignScanState *node);
425a429,430
> static bool fdw_prefetch_data;
> 
1377a1383,1387
> 	create_cursor(node);
> 	if (fdw_prefetch_data)
> 	{
> 		prefetch_more_data(node);
> 	}
2989a3000,3010
> static void
> prefetch_more_data(ForeignScanState *node)
> {
> 	PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
> 	char		sql[64];
> 	snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
> 			 fsstate->fetch_size, fsstate->cursor_number);
> 	if (!PQsendQuery(fsstate->conn, sql))
> 		pgfdw_report_error(ERROR, NULL, fsstate->conn, false, sql);
> }
> 
3019c3040,3042
< 		res = pgfdw_exec_query(conn, sql);
---
> 		res = fdw_prefetch_data
> 			? pgfdw_get_result(conn, sql)
> 			: pgfdw_exec_query(conn, sql);
3049d3071
< 
3051a3074,3075
> 		if (!fsstate->eof_reached)
> 			prefetch_more_data(node);
4836a4861,4879
> 
> static bool
> contains_complex_aggregate(Node *node, void *context)
> {
> 	if (node == NULL)
> 		return false;
> 
> 	if (IsA(node, Aggref))
> 	{
> 		Aggref* agg = (Aggref*)node;
> 		return agg->aggtranstype != agg->aggtype;
> 	}
> 
> 	return expression_tree_walker(node,
> 								  contains_complex_aggregate,
> 								  context);
> }
> 
> 
4866c4909
< 		if (extra->isPartial)
---
> 		if (extra->isPartial && expression_tree_walker((Node*)extra->pathTarget->exprs, contains_complex_aggregate, NULL))
5190a5234,5246
> }
> 
> void _PG_init(void)
> {
> 	DefineCustomBoolVariable(
> 		"postgres_fdw.use_prefetch",
> 		"Prefetch data from cursor",
> 		NULL,
> 		&fdw_prefetch_data,
> 		false,
> 		PGC_SUSET,
> 		0,
> 		NULL, NULL, NULL);

#40

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Konstantin Knizhnik (#38)

Re: [HACKERS] Partition-wise aggregation/grouping

On Sun, Nov 12, 2017 at 1:59 AM, Konstantin Knizhnik <
k.knizhnik@postgrespro.ru> wrote:

On 10/27/2017 02:01 PM, Jeevan Chalke wrote:

Hi,

Attached new patch-set here. Changes include:

1. Added separate patch for costing Append node as discussed up-front in
the
patch-set.
2. Since we now cost Append node, we don't need
partition_wise_agg_cost_factor
GUC. So removed that. The remaining patch hence merged into main
implementation
patch.
3. Updated rows in test-cases so that we will get partition-wise plans.

Thanks

I applied partition-wise-agg-v6.tar.gz patch to the master and use
shard.sh example from https://www.postgresql.org/mes
sage-id/14577.1509723225%40localhost
Plan for count(*) is the following:

shard=# explain select count(*) from orders;
QUERY PLAN
------------------------------------------------------------
---------------------------
Finalize Aggregate (cost=100415.29..100415.30 rows=1 width=8)
-> Append (cost=50207.63..100415.29 rows=2 width=8)
-> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8)
-> Foreign Scan on orders_0 (cost=101.00..50195.13
rows=5000 width=0)
-> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8)
-> Foreign Scan on orders_1 (cost=101.00..50195.13
rows=5000 width=0)

We really calculate partial aggregate for each partition, but to do we
still have to fetch all data from remote host.
So for foreign partitions such plans is absolutely inefficient.
Amy be it should be combined with some other patch?
For example, with agg_pushdown_v4.tgz patch
/messages/by-id/14577.1509723225@localhost ?
But it is not applied after partition-wise-agg-v6.tar.gz patch.
Also postgres_fdw in 11dev is able to push down aggregates without
agg_pushdown_v4.tgz patch.

In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch
there is the following check:
/* Partial aggregates are not supported. */
+       if (extra->isPartial)
+           return;
If we just comment this line then produced plan will be the following:

shard=# explain select sum(product_id) from orders;
QUERY PLAN
----------------------------------------------------------------
Finalize Aggregate (cost=308.41..308.42 rows=1 width=8)
-> Append (cost=144.18..308.41 rows=2 width=8)
-> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
Relations: Aggregate on (public.orders_0 orders)
-> Foreign Scan (cost=144.18..154.20 rows=1 width=8)
Relations: Aggregate on (public.orders_1 orders)
(6 rows)

And it is actually desired plan!
Obviously such approach will not always work. FDW really doesn't support
partial aggregates now.
But for most frequently used aggregates: sum, min, max, count
aggtype==aggtranstype and there is no difference
between partial and normal aggregate calculation.
So instead of (extra->isPartial) condition we can add more complex check
which will traverse pathtarget expressions and
check if it can be evaluated in this way. Or... extend FDW API to support
partial aggregation.

As explained by Ashutosh Bapat in reply
/messages/by-id/CAFjFpRdpeMTd8kYbM_x0769V-aEKst5Nkg3+coG=8ki7s8Zqjw@mail.gmail.com
we cannot rely on just aggtype==aggtranstype.

However, I have tried pushing partial aggregation over remote server and
also
submitted a PoC patch here:
/messages/by-id/CAM2+6=UakP9+TSJuh2fbhHWNJc7OYFL1_gvu7mt2fXtVt6GY3g@mail.gmail.com

I have later removed these patches from Partition-wise-Aggregation patch set
as it is altogether a different issue than this mail thread. We might need
to
discuss on it separately.

But even the last plan is not ideal: it will calculate predicates at each
remote node sequentially.
There is parallel append patch:
/messages/by-id/CAJ3gD9ctEcrVUmpY6fq_J
UB6WDKGXAGd70EY68jVFA4kxMbKeQ%40mail.gmail.com
but ... FDW doesn't support parallel scan, so parallel append can not be
applied in this case.
And we actually do not need parallel append with all its dynamic workers
here.
We just need to start commands at all remote servers and only after it
fetch results (which can be done sequentially).

I am investigating problem of efficient execution of OLAP queries on
sharded tables (tables with remote partitions).
After reading all this threads and corresponding patches, it seems to me
that we already have most of parts of the puzzle, what we need is to put
them on right places and may be add missed ones.
I wonder if somebody is busy with it and can I somehow help here?

Also I am not quite sure about the best approach with parallel execution
of distributed query at all nodes.
Should we make postgres_fdw parallel safe and use parallel append? How
difficult it will be?
Or in addition to parallel append we should also have "asynchronous
append" which will be able to initiate execution at all nodes?
It seems to be close to merge append, because it should simultaneously
traverse all cursors.

Looks like second approach is easier for implementation. But in case of
sharded table, distributed query may need to traverse both remote
and local shards and this approach doesn't allow to processed several
local shards in parallel.

Interesting idea of "asynchronous append". However, IMHO it deserves its own
email-chain.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#41

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Robert Haas (#37)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Nov 2, 2017 at 7:36 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 1, 2017 at 6:20 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Yep.
But as David reported earlier, if we remove the first part i.e. adding
cpu_operator_cost per tuple, Merge Append will be preferred over an

Append

node unlike before. And thus, I thought of better having both, but no so
sure. Should we remove that part altogether, or add both in a single
statement with updated comments?

I was only suggesting that you update the comments.

OK. Done in the attached patch set.

I have rebased all my patches on latest HEAD which is at
7518049980be1d90264addab003476ae105f70d4

Thanks

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

partition-wise-agg-v7.tar.gzapplication/x-gzip; name=partition-wise-agg-v7.tar.gzDownload

� +Z�\�s�F��W�����DH�O�hg�-'�DNMmmm�@�I"
X�8������@)�Hj�vi�$}����~��K�0��s
��V���I���~z��LF��?�������zno0r���^8�O��q����E���=���O^���m��M?�F�G��/|��Y�v�$�;�mt�������$�;@��:Y��z�Zt�8K��_��@���=��������w{OX�� ��q���&6����wA6��z�q2��I0L��r�FCo9c?&1��[���^��g`���<c�A4b��^����d�O_��q��m
R,?��[�{�~H�6L��~c��������p�����YW��7�g����x��
s���fZ6�&����l2)�,K�&C���� �0)[���B�XV�����]x��d����_<�n#/�1O�����	��>���b}�#�`���m6��
OqNqb���x�qy�sjYA�\�Ngf�����Q��0�g6���K�&/���A'�C����Y��c���;���m�!��1�z��>����Sl��x�Pg�|:y'G���	;K�$k��������<��<j[�E����������G�E��-=�"I"���xQ�i���P��J�|��
����������L�3����Bls�$�s-�s)��,�}���E���[��w��������vD�M�u3�� �#��_��1?���t:nOF��?f��D3`�\	G�Ni�a��i�y����6� t��';c?$�.�IIv���e�9�|����n��9�I�%�|S����������'���S��[��kz��Q�:ixds��-��)��r�+���4�G�"������o��(��#l�����Ux/yV�v���X[�pvM��B#��Hp�A�A�i����~[�<!���D��v�f��yqP^�����8r��o< �H�N������)��p!P}��f���P9�\j�y���N���H�L�x��^��2�T�T���_�\������������9��HMj��PP+�D����%;�k���E�0�F>�G�-6�Kt�B ����8,D��K���o���jt���4AH���D���^"M�Q�YM���]��I������B��Js���A�8>[���{�dk�KvB���R�"J)R�Jz�>
Z�_A��U�LS�4/Gm�noy��H<���lW��+��*���(��v�SFpE.E���p��_��n
�=K��;Y��������lLK!�A���-B���'�F�OK������P0M���	�Lc��8�� ����0�HuU8X����[Xj$� HDZ�88@e�����L[����w�r!��A���&�3�gqSK�3I'�����Q(���7R�Ei�q^�X'?�WZ]���9�K���tA�v�`K]*+�ty�����]0�l�k�p�W����p��C��}$��Y�o`��dWP��b�4�`���4���6d���%�D!�(���rY���5Ho�l�g`��8����"�I[�q��.�����8�Ef�����N#^W�����@��7��~I�E3R���O����	��$�}+:�F���@C�B���~U�#Yqp=��	�lB,��V{g���K=���Y��G�<l���h�G2�0��^�}.	>�-��������w��\���o?%��}��~��J�D�G�S��`����w����n;�>%�y���m����;m�&�3?$h�	S��iL���m,��FF$6�f�!���u�������e�/�,��������F�s�j��8���q��Ha��8Q��HA��81�M0aW�p���[���m��dqQ.�����D�h�y��q]>��X���bP�gH�H@��f�n)�h�*N����H�����VR�]Ci+X��(�}`�f��8u(��K8H):��f(e�������n�s(��-���������J��hd�V`�����H����8�~�0�~x�d�$i�C�
��DF
3���&8��C�n���!��B!��HH�A����c� �U���c�:jV\�����=�}
K4��X�{E<�c<�����Niq9D.�;��;S�u���(�{�'�K��
���(�+�.��_��lt���~<������������<������7���y��`���[�&��[�x,g��i�����h���w����l��l2C6K��*[q�"��g]J�Dj����bvP`\^c��@=����q��M�O'��
������,}J�I�:��p����C�*� "�r�(>�U%����a��p�����-���Fj��w��������[DEP	����I.����kF��?������n���7�����$���i�&��$?L�c%��
,�pM5�+���� �_\Z��v����2��:��yyfI��S���v"�����I@�X+\KQq�G9E�M^,F�aT��c����!!vn���'��A4��������0o�;m�o�\��h�D�A���N4dS��P���$z9�>,n��m,c�����q�&�K�h�]����g<9}����@��D�� 7F�
�����C^?��"I��������B�k���a:��`��t_�"���� �"�����4LnL:1��m�$�^���o��%	�p��
T��H�V�sPrga&J���t�4�A!�$�����{�� 4=
�����E��/_z��SFBH���,�	��A�ju�$^�+�f�������������?�n�60{o?�j8-�#��7����=��k������/�*]���EG�6b���D�tcw�\UD�E���,�Dw
���}ss�;t6�ma����c� �_1�Y���.�E�������n�k���5�wO�����
����@7E�zG�����5�=!
DW&x�7�S�����1[�9PA��v��Tq��n@7�R8��7o�?�:0Q�Mn�����d���T��l6����q���p9�V+��v�U^{o���;
���~Fxdq���)�8���/�q���8���M������Ic?K��E�f�E}����E`BzW-[^7/��|

��8��<e�!����^>��O�����as�������=hy����@9�3��/Hb�O��K4���^�{��`W��s�$_�T{���/�z�b���d�������j`��+{K�e7��9�+��V��_�1�NU9�c��'�r�A�h�fY�gR���2�*}+����[z�����h#�,�Xn`�>i9T�?�a�K�0�,�� #H8e �CD�)��r��!P�2�@l�I��'�7���t*�<�r0�
��H�Z�Z���|Jp#]x$��Hgrsz8�����
{�y�\�|�
A�LQ0��nTJ[+�!��#&s�4Q�:/p�Q�T��uN�b�����'�W�*o+���ao:��������Z5���Z���+����~	u1��]'�����N�e�4�<��G��V{��]Q�=�P��0V�ZP��O�#���&!�#����\��0����ry�B��<.���m�~���������c�o����8#))l\] m"���}@�
�!%o�9hp����J������
c P\d�f����-��xPE�2�n
�-�C�3�4�Vy?�J�	c�����#.
X�e��j�j0fJ��X�P���IaE[��p�VW�y��o��@{X���������	��K�s��P�������
���c�1��)��
��C�*Y�%��xh����:f���tp
4�G�����e���g��o<�.rA���s�G2��t��dWH#J1e�[n�������EJ^��W[nWdShR]0�f� d*@�F����
N��F��T�(Y%��2����m���w&�81�n�w�����z�~�{�#���r9�����R.��kRwR�)/BX��`��NYcCuv��aJ��FF�I�A�ot6�pZ�<n)���~*9�NU�]tz��K�+u��E�uS$<Z�����m6?�'5r��!G���S����J�Hl�;y�w
�?e}�C22Q�9�������Ou/�	�Y�Q�X[���(���8����%`��($:_}�cK���g����c��g��;s�`���bt��K��>�Tk���l��0~N�.�8�"�|��t��L�q%l���M��w�cMQ�I�������Vk-�EIj���-�U
u2{�I�hA%������#�������V�xyeT��*��56I����G0}S�:>�'���r��!0�������3�/���\��s/:���3}�mU:���S����jB���m���Y�����0��~��`�,�x�]�}�L�������oFt�� ��X��G;��S	��+���Iv#�J|���|nLQ��C-���Y��V�.���8�>R1�R�*���`C'�o(���:x�.������w���� ���>�1�����7�E��'Um�	b�|r<��k�����	��FU����u�4��}E0��|�W���(MfzQ?���h��L)��"S����)� ��	&��#A����W(�uY:`j0�	:2N�p83�Z�(�6eC:���'gu�)]��=�i��q�L���j��Rn����*��On�~ �Q�h<y�B��O����=x�e$���h���1W{P�0LQ���V������ ��4����b_��Y-&tqka��xlN�e��KUO*;�
�3��!X�>���l�a���Cp$�2
&��.hry\��f\�g?����9�a��^S-��*�@	��e\�1�Y+j��0Y�?��A��f��v;�A�]��x�<< ������okOW��r!�[;���Y#|o�4�5S����<�nI�x�5��h����[��z�b���dv4�I33r��L.������N�X~x�z��.WN���T!^�����j��]����P#�)�����U�4������s5�(W*e�^�L����B������,���+��(�����H�R�&�W��Sr�.�h4�.����	FA��l�4�|�^��-��[�oB��rtK.�����zx�cbw�������J&C2��H9�D��}�k:+"���P��!R�Af�?5m��*�|w�Vu�4d�h��������V���U���q�2"��;�9��<�#�AJ��|�p����x���p�v�������P�GZ��?���P����DK�JX�s���?�1�=�g����R�v�S~�m�i�+��=z1�(�]:���Z�K��Z����+J�����Di�GK>����mbX}��6��~G�l�}�h�4��!�]�xnd��!�6��Y�jx����p����[�.�~����	/��O��V�X�2���9�N���tw�<�Y.M�����w�i4,�+�'���f�������D�c�E���'e;u.2�S�?uo0qG}��sc�[�E��:�>���r�e�E���[t��zO��^yx�E��i� j��*�����b�s��M�n�����	?��{����/�3[��L2�x��N��s�G���2������VnH�NG�7��M>�@/fkR�+�)�m!�t�|
�����(�WTX�P��m��P�&#t��G���Fd��W��W?^�};�����z�����
O�DM_���U�'u�P�0��j���7���������a>g����n��c+r��x�Y1@!?���	kC�����'��2�;-��>����~��d���q�X���$�=z�$���/�|�{�5��d������h��5��Hfx�Y�5}�t��%n���n���,4�g��-��,T	��y@��*0vH�Lz���Bn���y����dHG����';������@������Ao�tG��,���3q;yX���9l=
�P��-�������y\��,�y�_yI�o�`������B?�%������$�]ey
c#����rB�}���Q6��7�o�N&�O���"��4��F'O���R�l��"�$D������M��t���	�����U>==-S��]@��O�����w�nG����+0;g:r,)"���Mf=����;����3��st(��5-Kj]�������� ^$�I����$�B�P�Z���.p:�|Z���g?�"�$��U��,�E�T��F����o'FD�R�013J�C���_���f��NXO8T��9�HX+��$Z��<��M�g�h|5}���r��
H}���t��E������p�zC�Y�0$����F���@`�OR&z���O	
��������� �T7`C��VB"���	c�K�3 ^��Tx)r)hn�h�^��B�9x������;/�=�����A����`�����v�vV��\_��x���zE��/��F~�����-�E���N����$��L?��~�#1
�DZd����85]}P%[������F�����ye��$�����-R��}�kK�T��}���[c������X���~[�D%	��q��2N������bw�bw��dI�
��m}�y�<��3�����q��}�S�j�=���[�4��-��Dy��8w�E�Fa���h�����E�d�w3��E	��A�����lF4�L]t��(��q3���S�B
����<�H���p&�����2�=�,k2�0F'\$����
@63�sh2~$*3��b��lv�~���z�������KJ"�9��cw����i�r���GM�GLAo|��G�4�6j{��$
�FTO��!���1��l�	;�\:v>'��Ds��^��R�*��'��	�8�kM�����
ZK
']���qq��''8�P)��C0qH�N&�F1b��P�l��$a��w����|�S�.��gN��c:9H��=-�Y�j��:�"�50:����P&K���R���~�e�$
�"g4NtK��	[]����z"/��y�/����e���Q�x�,
!&'�"
!��l��y���F������;������q
Y���TL]?f4��ts�W~2�
*�.�p)c62Z���a[�{������]%����ZWinYE�C�u����O&f�;<T;6-��l#X�p�Wu�O�WJ��H�����:��^��=�����G' ��6�o/�K�$�^�A5��z�Sd�"���z������)b���f���pbre��"pt�*��J��Ac0���%����a(�h��q�8�M1B4_q���&��_4"g[W��ht��?9??9��9?���w��M*o�L����R�d�}2��U)?�Wi�?��I�;��b,��|G� aP�j1��1ve��F��^�^}�����K����M�������Q�7���L���?)NNH��$��]�C}a���aO�
���m��R���P�Nm�T�i����N��SNG�,V����v�$C����TTudN�(~@~���B:��`D3*�`bCx,0����d��4��(�5�1��NH��j�d�����C���S#����b����K�k{EA����*��U��%�\��A��J��#g���m��bx���e��`�+����]yW�	�$:V��o�Tj��~-/daT��\z�� �T����h5��NF	���<��������68��zS���q�=��@�1
;�wjk�c����O�i�c��&��N�4n�gQk�"[��q�%*��jU(.2C���L����=�I���j5N�Ae����R<�8sZ��J��ct$%��l�fC�X�O����O��"V)��Q����#�l/��)�H���)LS3
}LnUH�
?Uz�	�l'2�,+!8|��uS�KOWvp��o3�J,�����C�������a��c���=�����Io%�
���K �]�s�����M��T��V����a{D"�n�,������[�z�@�LTBx5d��]J�*��RF
�mR+T� ��&pCSp�B���mT��F���K��AtO��#����6H���&4�
-�|=�z
��I��"'
I������h�������Z/J�^���	�[$[Q�y_�A�/]��n"��Wb4�;&� zV@��!~��`~���}�l]�(u<�N��D+��Eqi	�vD�V�j�G�pb\F9)�PepUE�D�y����L��0����'J�W�������u\������p��b���D��1��s���.�	�(@}�UN�r���6Z���u�}�~G�s<�I�����\s?��E�H��I��h\%fF��$��mp1�����������B�9j�C���I�T"������X�s&i;5�o�;���j�>��#n'����DY��~���,�'\������p��������R�g�Z�����l�`x��dERv,��U��?	��8���z�?b���e\ig��q��3��~#M�����s�a�����*����R[E����`A�E�*[�|���s HA�hi�S��h�!�C�������}��Av���j�;Qx7)�*X=�,T�%�;�b��v��IHq��5�� P��)��s�����h��&��j��?�s�I���.�'�������G�ay�HA��#;?!������qc������ <�J6+�b�����$U;���J�"�`Iz�gz)���x�U@[��:�SR9<t�#�������L��������E�p�2	3��2���u�2���� D�Db ����$-%���\p\�.%+`��8�3e��8=�\�k�H��#�8�aRs�q��;�R���gb�O�^���{{����wqr��M���{��%��VhMc��u�?V�YO�����
Iy�_Z������hO|/!I�G1�K�A��b����Wo��e��`M�+�8{cw��pn���K�z�����K��8;�<y��d������w����a��|�
����yL��q��@/���
^f0�1�������U@�]��$�s�O�1
y��c����rV��D��6�%�8Y�y������G$�E������^��J�2��~Z��;J8���+���A����T�S5��{
�yjHm
ub�:H�{f�P�>M^2p��]�����O^Y�����=i�������1��;2a����R��a~���B(H�M���SRLsOgS�N#�T��RjF�kWCm��m��I�,9'��E���$g��[zpy�('��&��d�A�������-����Z�K�#������M�}*��2�V��Q�ix�W�i��F��pM���H���nK�k�,�1i!L%�$��]s>��<v=e���7`p��D�D_�j�,h�K-�q%�E�}<v�VyK�y�%6Y�_5�O=4]?��e�a4������t�T�\/��.��������nLH+��L��jS�BF�w�3>������J�b�m�%L�Z$�H&�7Oc�^*+��I�����[���S�SRU���������y�.�����Q��
���@/Q���N#=H"m#����M���~���`��+��:#�x��\��K�,������Cw*R{b-oQ���7m��$�.![��{.��Xg�9P�>S�S����g,��<?���^��=�'Xg*�A�4��}#&c,*U�>b���!=2��16��pSr�>SMx��Z�����E�B��?�g@�m�8IO��08'���$5tx%���A�gE7b�W0O8�S�S��b\9p��)c�;S�V��V�v��U����Z��xp�l�*�����F�m����$e��sfk�������������N��h��TCO{n�9g�m9�d����|�\�)6��bBN�Jx��^up�&��Q�Em�����t�Y���`�������.t��:X�C�'��p%�*�P��O������)$��=�t^����>�S.�-�%=�c��W ����T��$��V|J���TUA Uj!,�����'�s�S�d�0]��-BQFC���(��
g��mNu&�����P^rz?�lz�<��$�&�'��#��-��8�0�9c�1Z����s�F�r�h�I���e�L3����&�m|�Q�v;�O4P��>�
3LT{){�&&;E����
xf�]�?�@��@��"�H��(���pI#n����8�l
�2
Z��@*"sh�����!\v�kY@�*r��V���q�������$��@��W46�pN�[��d�*{����_�����W�A*�a��H]8h�p���`�������)SM�~��������RR�QQ���y�^�����&Z��������?���I����B����o���^^e����}�+qrYe�pr�����wo����pr�����+�K�����l^�Y����F��BN�"F{4�������bv���B�<
2���]E?�2�����g�BX2���g��yR�E�/�v.����������F��
G~;��L��c&b9���s
J��h`V�}�'Y�<�f���j<������������h������B�Hb5&,�N������lYH��:Y�'`{^>Q��k1{�c�A7jw�v�^�p��0
rg1��M�����fL�
��f(������1�p�������h5��ogo�����)���Uk�����u���}_Vh��U���m	R�0�l��l�V���2L��Kk �4v��;���_����4�����a���Sg{��T�n ��.0[��9�%_��0g������8����;s�I��Q�J�Q�9��C�L����@�lL��w������j��((��')�1�r��3Q[�F�EU��8�K�����T�>�&��V��R����f�U(����E=���,u �T�vNg{��T{U(�8-g�j��4Z{���>��'���{�j=G&��}E�s[��*v���E��"F5��i�O*��?�>��X��C0��B2����}��{Z����������i�L^��T�~|��`���	yo4)�n���_-���i1���y��p�	I@�VN�"�).|`�X���fT�,���N����j1�����x����AIPL�1�}���c������c0��J����dvX��T��"������]�)����)! �X�oyG~�^�������B@����j�I1�@��"�(x3,��6:Y��S-�cY���T�t`o`�8��7��?�bV���=Uv��hC�f��yo1��qs
�0\+�xt��^=�\,_v��~�^��(�?*�|�����m�)A��y��K,!_�'���$��_�VIY������hH��d6B����5'	'|�}'������%{
�g����l6�7�����!�����*M`��L�=��1^l]�Yh�ek�">�	�IGB��������/z��.�?"����I���G�z��h6��v!R}db�jEi��M��\�����h��
�*w��T!p
����qR��?�l�pe��.*�������S�����O�������W�tr]�����3dI��O���������.��
��j��:��^��F����]��^�1`|������Ou�1rg�DUH�21f��c1�����)rN�?E�1�??���xf����L�~ASD��:[pI5����>�,[��"��&�,���T�+����l!�-���0����NX�WNE��^���L[ �t)�}{\r�����"��d����`/�bY�-jx�{���#a�^�W��#�#��@Z��-�9Y_,X������
0t�z"��e=l��Mf��0��<��A8�Y��N4��jFr@�2�O�>L�a4�X����_|;p�1^���?IS�48��0M0�����Z������u�����2Z��o|~�^�[�������4���~�}�<i4��{.��bo��jVl�F5�R������5��&��:����_|m����o
���?4|���k������?�+�>�Sw������Hy?����a8�t��3j{������Q��7'��F\���z���a��S���|5�����o#��:�p&������/�C��|��l��f7/�������q��v�����y�����f�6ZAc�b�}�)�_.U�������n����"�K��O7����h=���'`b��i���_��t���
F�M����e_=��XO��S�#H��z)R��������x����F�P�!��V���k���n���
�|8�[�c�;b��;r���f\G��
��x?r+x�Y�b�F��p�i�Nw���>��*�GO��8"��5�]�]�Fcv������������h���z�7��2Bgpv��Dx��L�Jo�X|�ZI����}�f�1���f�^�CO����������|��'����A��Gf5���<�i���nX�����XT���rkPjm�l��G�-b�z�4���u���U��	|�2�������b]�''�����ZA�/�{[�����*��B��6_6�1�<��Dx������x���#��($b���B�d��(!"
]L��q�JG���*������v�v�4����`f���JK�-���>[~���m��<�j��!�������AuY�o	�R��*	���q~/��J��dz{Z��$�������@G��wqW�q���BUm6�2��>����:�(�,�-�^Lb��R!��Rqb��hv|���h|�-w�w��(L��t���b�>[�������������v�����Fk(�)������E�;����������e�
�"��'��\��"��"�
1SR����i�"Pu�TG��n`e�br��,7���r���5'��'�H���A,�&���f������,��iQI��[(����dm��UPJ�8����P�����
J������&��'!-'��r���k�8(�Si
j�v�:U�����tZa�����S�N��E�~"%�g���!+)qh��T��REC�5����&E�$0E����/UwBy��@�w\�}���� Q���K���Y�P:�~)[�X��JNB���e�|&��B)�f	����`QR(�@�==��
����D�0.��V4iUr�7�P�j�o�M8�[�����&��\�dc�/m�s�016�E�{�lw�8(a�g�*�������<K��;�*~�����B��P�oE�>�{{��S�H����:f����=W�9v;P������s��f��x~�j'��i�c��U�)�.��Z�
o������*����?Q�����`_��F��^���
W9�!�\���*��k��}&E����h�`�F�����z�5�f����s��Ls������{�v��)�py-c4�5$n]��������Y���q��t�MG.�SM.�l_~y3���'��o�E�'�U��vYl���pu����_���a����{K�n��GT���������:(���l�S�?*�Qr��v�$��<���CwE�Q�>h��NTH�O�����mI�o��~v5IL�������5���.�h���lB/��r�X0XKe!�lQ��|�I��H���z�qP6{�_���cX�=��)�A_��vS��;L�b�GsZ����s��(*��yEX����im�q���,�N5|P�=�3[+��|�j�N$I1�/IB\�c�;����Q!"oG\hA#xZ��k��8���� D���Eh�$�)��Dv��n�����y�����2h&=+��8�X{�j��H��N���b��A/
��o+�p/<�����t�?��^;��ht�n�����$��^4#	]H���yM�u��	�g�"u�x�7[��F�~9o�egb�L��4m}[o�Qj�=�����Y��<q��O�WJ��)��p��F��{������s�h�k���n��+��3h5~���k�;���q��g;��V?�:���p����(��A�ju;~����{����O��g�?1����t�y=[_]��P�\��3tk�0p^�������/Qg	�6��j}�b�
8k����	��Z���si���-�������������E���=��{�"&G�C=�,gU��)�1D�.�a8�>���2��3��?��Q:6[��bqGE��/<���"b3�����=�$���c(���y����bixZ:bX�}Z^�J���.������5�����
'5�g���d������z�v��g����)��Ok��;�
�(�a���`pz���|>`�2�]�/OX��>
�f��6Z���fBTdf�c
�!�W��g;��s)��jlE�����c8YGZ���d5�p�+����E���x72!������'�^�^���{yv����o���=9�sk��n����e�3^�D��������y�������oN��gq�E8b*��+���^�������4�K���V�������Vb�������"`��\�{q�z�I�R�����W85i�L�����LZ���&��r���1�syDy���
<|$s
,�7=i8��^o���hN���~�d�`/�X�������D���E'f6�x�u�@�9_z�U�=�O���n���>�cG��X���,�S�.�����j������P9V�>On��U��=e���������������#L�JK���F���[p�����r�eA�P/>��~��u%�?�K�.Kw�oC�����f��j^E�f����;t��,�����)��yL�VP��������DbY��>S��A4���a�,;d�>�1���c%^QO�SLU�����E�k���S���R`$�AH��$�KK���M�S�U�L����a�����TMc�;����+������@��D��>�O.W��/��>����_�+p��03XjC�������)����d��:gA�z������~���l��gw���{;��,�N��g������>� �\.��N���D��#1��:?�	D��}*�,
(T�f� ���W�,t�=Qw/g7&��#o��!����"E0 ^f�tC�n*�A���g�����*�9NZ��9e��c4��U�
���,��U		�D�7��-u�>tR~��S8���O����1�N`�E���BZ6J51
Oi�5�$�`qE��I�@&��
AY�E��v�.K*�(���QU����	$~C/,��*����*@�(��/"t'8�+�n�������������Me"����	���xJ�4@����c��i�����:V���j��uv��b<s�S�nCl��Y��e��
����s���������A2�D�q��"X�0=�BDF4�Fn(A :m��\��U?��&6�>5���l���'k����u�Ul����F��.m��3����M�x������p��x�#���ay�F��M��P}Hm-�~&|2������VE������R�HRI�:�	�[�*;���,f��L<���V�	������A7j�-A��;��dy1�k��M����
X|�������+C+�{��fE�.��(F�\���)��$U����`5V5����~�9������K~����z�)�������r����;���������_������������V�y���'�'����$^��j1���y?���3h��`�>tv��k,�'�������WcB�b��kws��z��I���c�b���/���������������?f�7�R���p���Pd���6,|�r6>e
����i�B��N9���t��.��C�s�'����
`u�	8'��d���S�K�=��@S�q�~�.
N;�������/{O.<����s��;��_LI.��-��/������#�/�`��do��F����"A���RH��W�����D��� �x�WB$_Qpp�u,�#���f���o��,�+{9c���s���Q.	�N�x�#���CX�Q�Ma%h�=�J������3��t�x�|�����N�?����wRk�N'��^]j��\�k��j�Se|Pk���Rk�?j��Sk��J������^A���+Uk�oj��Tk��Tk�����������Z{}�j��7��^��Z{�
������^��Z�eu�x�w��-KcD�d���[���G��?����Qw���w�+�2����y@J�C��olx\��(��h6���\e��\P�E���������Q"~p�Uw�G�R
\�j"��*~5�F�*�O��Qo4�u�/U�y���������h�J���������%���#��"t�Xl��Q��Hf/�dJ��6X���Vx����Yv���%k��{�}�:?\�������Q8X�5����a
�����lM&a�A�d��-��5F<;���l��4A�!��>.���:���������_�����1����5[����S�����y��:��=������������7�a�H��8n����z?��kck,�;qy7l8^��Nn���Q ';8����ku��nl���?���Z/~3U���� hy���G���h0������le[w�U��Rp��oh�^�� X��W��&���*��5T�43�28��c��r1Z&K��C����'4*�jF����T��������#?}�Z����d���/��������*���+2�����h�UW�����	�DW��0^�/��k�����4���WCQ�V�V���w�<
�; 0�����n�+�[
����wyQ�#�U�;;�����.�G�RE�u
!VF������P�����YT�bY��\��d��x��E)(��mK����!���u1�:��XAMK6_��
�s�fZ��2H�H��UCp�fVd[,�@$�����e�-�~�"���%��;�b�����V�B+=�G��ix�T|�J�z�G�����w�
M���v��?a D����-�1I��?&zOe7�)nvB}Vrk����g����+�����2�f
8��OX�u����X���a���8��B���|��I
�I�S76m"c�yQ��j���a�t�h
4"�=����+����L9������i���� Hr�
V���r�7�X�9@Z������ ����{��������~:Q�G��#�AA�`� ��L_L"��/����������+��@�������������?�?=��7��B4���E��<�P�.�}m�s�[D�5WTXf0E�U&e	M��������,$��,|��mK�2�������/+�X,gY��ggd<��R&��U�G�T��R���"�SS���%���'�����q%����% �t2]P\��!��HY���1I�z�Q�c�����o��@g��������b���^���`#�Q�����p����T?/Y��w]�w��N�3�.U|n~�OP�7���/0�����R$6�������/��86d�x�n��Ve'�)f��>"�8���O��!g�
�*(�SZX�h1�@���������M9��c��������'J��qzAD6�52X�M�("h�r��#����9���m�pcVs��o���^�\���=>���|^����RZ��4M}c�[�Y�'ck�v����s���'�w{�j� ���x��:�f�����&�������>&�_|���:r1gO2�C���H"!{��]��*d%�r� '��2�	+&����S�����j�CR/N�Q���*���=FB�����F�4���1_>��;�.�
�IW��[,�%!�G��&���mNi7���`�g8ZOd�^�S�*��
i��b�W`-&�q�h~����Nq�q)S� 	���sG���e(�Bp��}�
���&:UYoD��D��51�&D��������������%��&��'�����A��#��5��o�IA����f�+���K��� �~A��8���?�$�gC��?nnK`Z��Wy����u�WB�2R��<R)?��co��>f�Y�k`�-��f���b�>-��9��A���TD7�A<y�>��5�b�A]����#��fMD(��Jj����`�T���et�LeU*���C���2d�6xc��4�gl6����V���K;gH��3�Ddc�Vf�CY��,z�NJQ:x�`����n�#y�)e�dJ��
#n��SHc�gr����i�0/��#H���2���F��@o�a���]�_�>3�����#���X$�rC�+&N��A�4�C�y����#tZ�"bx����4�9>|�d/�@+]�Q���e�F4x�C)Jh]���F�f��N�,2Hq16-���?�xG^�o��_~!p&MY�5-�J��|q
<���Z��8��i��	
K|x ���YA!5������[�r��U�������'�j����3u��,��j�{cG�Lc�0���u��W\��;�����i�d6�	S�- wC,��2'��4�m@�T*Gdp�������["73E��!v)��|l3G���_��+�,|��,�\f:M��\��4����1�1��h23@�Xp�
0m�2?��
���7�t��Z����W�Q�
���Z~5h���)��8/.fy����V���(d��1�� d^D*��YUa���R�_��sa��Np(c�],���.��;R�r!�p-�c;�n�l�����������&J���u�����>�C�%���s�'���hb�f����������a!/�W<�G�p�Q�C��� s�9[7���k���c�{���4����t�	��l��i�6�$�3�3���(>�:�s-�#'e���!A�����T]�C��2K�=D�	�:�a������
�&�0|s�.�'��A`��d�=� �c�tiL�����O1�@e�j>�%"��E��K�B��4|]�W���98�>?�m�5e?4��{i�eq�uy�VC��k1�B�&|K���[	O�=�����v��r���d{�\�WT�=�NiDZ^Xe�
�rd8�,YW��(��i�n��B}l����*���N+��0�]V�n�i�
�w��B�-�e������m���z}f��c;�d����"��@��&�[��S�<"�E89D�A[w+� �����Nbo��&>'	g�SI��
l�p�4��&�V�(���l,}M�����0%cm�h"������2�!�2����8��%��B>&6�������cI,�&�S|ux���/�Tb2y�O	��]Jm�Q":H;��/e������$�s���Bl8%!�=I��J3��:H�2�y��i���f!���k:��L�mU:c�]�Y�^�����$n[����e(|�P�,vm��aGvm�sv��p���I���}1�Q���H[�A���|��?N,�ml�,����6N����������L7��#sp7��^��
]pt�)�����\oD'IS6/�~`�>v<�
����7�r#;q{�P�'���Ne��d6�R?\���i���w�;mP���F8�P_z�>5�'�X�E���I���8�*;�)�O�v�����,�y��UP+�X��N���`����_����o/��3����(����o��_�7��4:�T�/��������F�6��p����F�����~68���(���A����O���������$�T���$n������'�Y����jq�S�[W�&�+��a�O��'��2����5�W2�W:��w�M���Z�������n����P��-Su1���5��_���$�sTt����)����Y�$���r����`���`��-`/��
OI.o��h��ZLn�J�a�.�]��V������U�/��js�s��3�kf�O��^�	��K�g����3PkA_Z�{J=c��jNCS�y�at���4�7Rto�k�(�����qq��Z�U�F>��9��>�������
��
z��[Eh��|-�*��e��E\��a�#�����������}���:�n����i{U�+�a��\
�_����"���N��iL�Ku��u�\�9���ld�"*��Zo8�%�]�S�O�2�P_`�C
X1�%1���#�X��M;d��q2��]��zs8��<����N�8��`�
�bLO^�r��3�(,�q�����������yWp|KJ7{q��f<�������C�:@$YE"�F�0Pouo�1��
o-{��Q|Y[���.�n�R�s2H>�X�Z�'g8��-tL�8�U[�)������	��n>�?��Xpn92J�ND&�(�����]���h����l�!���O��\�DG��M��-�'H�19
�O����3��vM�pD��7����w���F���zv-����o���h4�����t�c���l�6h�8�-n	���v5�
�	d����h��]2�fL�UBLdxZ�_�����3���R���p��,��[��B\���T��$:�j���{ql��9B�&��T��H�a�8 �J� ���)$$-V�A�����:ZFRK���DGh0�n�t��]a��>��p��.v�O��Z<�b����^�,�%����/Lz�x,���8 �q\�U�t}�����:/%���%`��z���Zc��+K�T^����S���l�5P���%��h��!��#�D�q6=p��7��#\m�tL#�?D7�.V�K��U�����lT1EU��i�������!��c�~
N�"���t��/�e����OD�����"�<_���M�o���|{|�K��4@��y�Z�����L�����f�Bn�E3�L!�����	�������4�������I=!�pNi����6���(���e�L�����Y��T"@�}���E�Hf��l��t���-�_����6S�[���.#�R:��fy�3�3Km��m��l(^^��q��V,���+Y��|2��J�
�E�7-l+�&���2m�<)O�T>��%���>��6�!��e�(��6��'&�/�7��nB��G�4�b.9�&s6�`�����1��*��PQ6��fA,������(W��(��.#��62��\c�����m���)L�\�#
�����;#������0��:u�Q.m�K��SD���g�������|���I`�����m�;�z����mqD�+
������`U�b?XtB�Ntt��+����Y^�t���~r���M�'m�Q�+�DA��aY�����i�0��f��m,�E
��nCs�z��1���!�������"T���{�V3�pNZ!
�.w0.?��%�}����l����\�c�C�b���
���������v������A�q�9@C{z�]��q$�)mo�w�|Z�uk���2�d�����)�,&�-&7-W���s��6�T!����������T��l���U�6���
�$3�-�3��]�;�iTH5���aR�V��#�9!��)c�f��bJ�A��$�����!����p3#B���q�D���]sW�(�qx�����)�����a��L��+EWy�,���RVl�����U��Y+�V��		��z�j��o���#� �PoAN���sN,�9j
����w�~u�K���r�W2�)�C*&�R����)|M�P���P��JzC�}���S2�u�O�%<�bb�9'��S�����>��GS<��M���f�,A�9(e�%�]�?G�K��jP#�/��7~��f��zy]����G^���5������k%�m�!��>.��9��p���V'h��v�?�t��f�ow���5�����{��	�z����#���7����R��f�4���8�r���������9ZOL�"��A��	Lg��V`CY|���
�b���I�y�
��u0��<���r�gt3`��cM����"o_'�f�m�{��'����z�0m��<$����������FL�������t0Y�'`�_M��A���9Gm�~��AU�8���+��������J�y��"��G���^�4��V+4�JvI!W%_��N���-����p�uqrz�����zX��������h�Xy�f��_�����l����������?9?a�}ViB�3Ji~~rqr�"L��S�	��s�!�|��Y)��������^��p��������������'�0���|^�����\;{�"����9���Y;��=G?�7��o�7���
�%{�����X�1�3U���?W��z���	�}�����5�R���q-+�8�o��	Y*��W��������oP2>y�����w>� =�A[�%R`&�d��K�����O��y'}�/b��g�c��/��9>�prA#��/�	W<����N�A�egg���VA�Jo
����O<���9��@<1��H|>�C	��C	�%���;>������vK_$�m�~; ���d2��������Y���?���2�M$E1A\��0������CS����4i�
�_�.N����������w
���M��9��x�?��tUy,v�Z5��}�?o��a��+���;;��?�Y�����'��`|��
5�O�p���c�3�f�|j�?p�
���?�`��,�%��S<1���6�->��`�+-+j���XCK��)�u��@�p�z=�p�)gJ|y���~��=}:]�p�9��������Bx�Z���f`��2��8�	sl���I�����S�������Pl�~������;zz��z�z����!T�>Jb�a!rH�5�������$�q�
����l�2%wGj��d��u.m%O�b�G+��0���m���q��Y0�R�p:&�I�_��
����.���5��b`b`b�K w
��+^;�wJ�C0t�~������
�*H���/R=��
}�Y�����
x��O<��'><��_��@<	�'Mx�O���9�9�9�9�9�9�9�9�9�5���2���IN3��BD��(����g��%����{]>�]���a"����v��|���MH��/��v!Q��D
y^^N����<�Q�f	�N��j�a$��P���~D�Y��k����O�3���%$�;��9FUd�Yo��fC�����]��[.���x~�����/�xnmz;E>��w�xv�~��2/�Z����e�mO����
	s�;\<0k�i���D�>���?�~�>cW�]z��F��/�$f��f=�����(������$;/�7��^�1zE���Wt�~�1�E���_t�A�1E�cPt���clc����X���u���(��������w���~u~�>�p,������-Rg�������g�9>7�	��������������m��xn$��p�H��$S�N2�6!sW�Ne��#��T�~�X�1��t6���F~]���h%�����3E�36�6�����n�"����f{NE����+q����XO>��+���W��y��c�}4�CA4'�D�6�����P(2��J��U��f��3��[�<�Gqh|T���X�S���h�N��n���������5���1��|r�#���z�k�����OI�;#>����6��~�
�����j��ow�`������[*N@�N�=�����~{���������XF2����7.�e(3eP0"�U�7����ts�����|�"�]�G��4_Xz�u��	�Pp|(��Zq>B����w*����M�}�f�#hv���)U(�65)���6[�F����
}G��0��� ��Stt��W�����}�Om�I@MQ��Oy�H���J�b�g��S�K��2-E�4��N��\h��R�����H��Z���b��C�(~u��qj�tZC��C�dX��|��f��DZ����d
|QE����Sr�����������OSZ�N�	)��/m�{y'��w�r9���EFf��T7��������
_^����!�yHM&�n���2��PsT 0��i�����
�!��gK22��xG��>���l/�=�Z���*��������	�s�y���*H��p#*
�9e[���G�6��]�^�����3���U��jn�*�i)z��]0�}�����`)�|[&9ov}�@�����I�m[���]�f{��j/@��/���HfK��!���!%���^�2%��*�^�U�Q����Yr`w�J�g)Qvs�����g9A�VD��54,p���2���� UR���!�%��^AJi$�}���w{{��Z�QN�����m�:����v�z�F��B���R�m�4/!�B
�$;�s�
k �P7��br��>��E�z�2��&�q���d���>����hX�������R�H�)g���vQ����s������4T�������Cet	��Sb0}^$M��F�- ��S'*p/f��P���$q�K$��!$��&$[ Wy�H3j.���LI@z|s]��!������B�U�,��Ju-q��o_G�[�f���w���J����6��B�ka�t��F#������G��6���6�k���&��:���-�T4� ��`�RdX�������J��P�D�`h�2�A�5]�pF|O��k�>J���Q�!L�.7Bo)ZcA�����'�Y��
��5��[[[����(t�����u�kj�-ce���(X��@\�l��&,m�=�~�R<?~��a��w�u�������������q������{���Q���#�%�UG�=��}�qrw)�+����p�;��7��{��?����5�dd���[
1<���0��������y��r�������W^��eS|�65�0i�e�R�����d�tf����z�&��"��*�LlECP�AOb�_�Z� CJ}��b��`���5��I��<U����|���"�Xz��k"��f����55N��:;�RghO��UB3L%:\B�z�{�1c0�cYbQP�d��)j�3�"����J���5_���SW�OsZF���������Z��yDc�J�qtm//1Ni�;{�s�����Wu>~��j �t;7��6�~��vD���B�	P�(?��2�\C��q�G����]�n����(hw:G��/��{^�=�[���ms���xz���*���z>���*�R.ki�������w��e�����v qP�}�R.��N����%tN�:�����%����H�N��j��v��+y��F��wCVkQO�`%H*�g���z6��@������iR�"�B�{���\����(���rqbM��>HjU\
�yy�ze�>9�4}�3V����Oy�"Z����0cU���\���{u�cjE48������,�t�����3_�o�1�:��*o9T;���g,I�W��������w//�E���
�Vq5�����j�����i4[p'���lf_����)�FrqP�IUh2i���l"�5�"�et2kC1`}Y0��6gm�k��y �~�R�3Ib?���8�����{tt4j�%Ng���lB%��|��w+������q�z�O��Y	���x���0�_�'�h���nie�j�l�,������p��9�[�@����ZD�i�O�C�����d���t��[�t��,�p}�3�s��1�g����y hM�l\pt����^#����~�<����?{�fkt��^�9F�N?�����	[���Q��u����?�O�����#�s�?s��s�e�9]����U��������~-�Z\e��oa�^�k�36�,I"4cm_�>6�Se#���h��DZT/��9�[bZy�[�=�������U�g���9E�]���2�����T�^�{��������������Qf���o��~Zx*�'�2��! ��Q�[t���p��F(h=F�L���&S��X�j�zU����/8��=y�r/��L{{)�g�����`����s,�WEy��-_�Q�����z�C�G�R��W�u�3����*_�[��-��	?��^�����8����(�-(,��c���p����?�0�o/.+��t���p��,�~z��/~�2g
�������������c���+�;�����G�6/��.(������|�	E6�;�4 �������n�]pq�Eu\����tA������t�r�<�����8��j�3�l\������d|��G��8D��y�
'��Yf'=S�6I�t3Y��I�`��l������hG����o���G�#�m<����8G�5
��������r����
��6�r�A"����xPJC������������e�H��������(�O��z��oiY��&��Y���#��zse�-zs���{���0u������{9���qv��N�q�8�\r������w�"0�=���[0��jS<�$�0�~.�M��`�<�����d��|�����s��<;�;�Wi�r��1���l�R(�� ��,K�X#�q��6�i{�%��O%�f2�`
��
v��<�%*9����[M������bhx�������C������C�+
OA��%�ZQ�`�7��j<���`6Y�L�����\�6(����YS6?��	��h�9K�u���S��������_�~���k��������QG����������\���,�$�V0kPu��;��A�����U'��c����
#�����������W�`�^	C��t����n*��a"�M��o�Z�`���:���U�'����[�����n����nC�����M�~7��rs��!o����+c��G�k)[k�j�j;j�i{i��j}�����:���B���.s�c���r[xN��3�g���:�� ��$s,��L2�������'�c��d�y�!rx�_��|oF���������F����������F������������n��hRL�r5�"��f- ;���3GWt�o���2�g)��{N1�,F���mI3��j����P
��b,):.�.S��Rh\�J�q�����V)8S<t���zwa���)~dHE�"�m�2��%L�6lw�\r�6�c#�~������I���c����#L�d�<?;=������A�
T��lDA[�k'.f��z^�r��m����F��6l���AC�J	��2�B���9G�H!p`����n�.������-s�EX���vh�1�XU�xf.��F/)��I�����U��*��a;���W@��Y�����8Jb~W���W�L���������4�9������9��-�i�3�s8�[�������au�~�	+X�NXY�++�	+K&�l�q�?c��?�����e����3�N�9;*��{\�g�����#�'M�~��cS�Gb�bF��m�-<g��?����:m,�q8��U�3e��!�7����0�-�����+j��Z��S�7�\E �,��>9�����U1������x|h#�R����+����$tz�F��q(w!����]����}.�+#����]�0����!�~I�g�4�a��^�����V�h�4��;j���E��q�GW�#�^���I�'���%Jm�09��S���lvb����8�;��\���\ F;��o���KE]���2��z4��\s$��&����I�k��J�x7)������5
��A�K�r�L�d�pH�=�[{n�\�o�c~k�q�%N�3z����?�U����^���[�i�'��.n���Y��6�2���W���W���&���
y��b��[,��CkC)�����7^�bR@jH�!o����X��/�8o�k������b�7�uu����4�w=��^�]�i�}��Et����|r�����O^8=eg.����`)2���z��1H������4!����m�R$���IiP&\���x	8y�P2���l�/%�d,�a<F
K����!"H_�OC�;=�����IP����8����������B�
��s"�ahz?��"A;�F�|
��>�t�T/S��wt,H�2��Na�#]l
M!	���L ��vS��[�f�|�����$X��?���9�1�s�����8$���b��C`cfk0Y�D��fu��n��M��{"����y��ep.�~��+�����k;�e��u���E��_���RLe#v��M��1h�-���S�Gl��7��7����anZ��oH��7�����{�@�{ �9����
�����<���������I+T���2�S�>h���H(m�+�&��*Mh����JZN��N�"�	�A�)K���9�M���ViB���9�
����h���~.��|��5<������j�zmo���}������2��\�L�J3��q����UUHaq(q.����ol�v%�+�dQ�rb�P�1
���q-���+�j��i��E�Aw�!�����l�7����VU�%C�����LW`uF�0)��t?{����ov����A�qr�����*������+��,r[��<Z�'�'[t�M����&R#@/���+�3�]�e�7�r�)�����f?/t6/��B�D�K��"(����[�M�����z����	�Heqt����Y�����=�L�I�Ll����v\:����������s�V�[\�������j�_�e��.�9�U�YP�v�d�n��n�F��g��'6�i�3{��f���7b��Qds��[%����]�A���XPDH0'�J������e#�!1�;@�-�����Pa����c�Jp��B�s\�����a��`;��1t8���(\@O~���G%�T$�z��o_����<�T���yr3_%dK�����3���c��Pn���������������R�S�+,���t��q�J6���}���~%+����m���J�,	����G^��%�B��dZ��5]� GF%<�<����(�T��Pe�
�R���'�=��22��(�`O"�@����uQ7�)~�6/�(�]���Y���`8�%�T��A�D�Ht����$���*V��}����~��v���>���������RKWV�F2�gv�^�>��}�E�Lxb���C��1��'�@��SK�8��6	�d%M�;7���]uZ�������aO�H��O����q���rmb&~b&~�����d'Um3	3	��$�d&;���w�"���~1�O��h)Uw�i3H)5������/��_������hH`�Q�	�V�9ZT�b3�f�>+�0�����J�u'�D��C�$��Y|A����EWlt��	X�\f�����z]�����k�e������-���+�e� ��A�-�e����*��
2�i�pn��..�Z��W��_�u31�f~k�Tk�`k����,��Iv����)^��i���,�s;U�db�2�[+I� K�Z��*��$X��w��q����(�:�R}{���K����;���'$���+�Lqm}�4OCHj�\G7�y��@/�j1��%'������w�����������������.u��x�>}{q�������J�m
�{1c"��00I�S �<4zTe��{���B�^}�k��k�1������������1���'��f��u���A(��������}�o{�;��x�O+V�@-s���jY)��$����
������C��-~�Z��39��*.��`���wTg����	E��c���c q��K�@�o��N�>�tpw�v$�o+@�:@@W�����%��}H}�������������M�m7dF,"�p���}������	?�~�'��O$�O���R?�W��3���������3����������$hb��T�d
��������S��`�hp�`s�W�D�M&�#� +q���@r9W��5����%:}f���N�J�a���@����t9h� Qpd���o��S�f��>���q�
�<������-�m����������!|3'KR�![a&�p/�h���!z�l�������"o�W��u;�������!��N������[z���e?������)�%)�)��
?(���- �Y�k�7�[�cVR���[@
|�~��k�R�����i��n�[�8_���#���U'oDZ�wl�����_��S`h�~��j>_�R��C����4����/f/�?��*h8�D���F�3wd|�����M��7��>�?E�eo-z�W.�Q]���4n�jHo�Q��K��{�����D[�n����`�����>j�/lS�;p����Nr�[�!�����E��#\G�����RE�]������g?@��&E�5� �g����#�%��d�����]:����Y���@;�}I�6�*��zpk��eK��X����XQ�A
������`����D�i����p:��OY~����F�����m��9�z����R�f-��K��w�[Yr�,,��]������%�<��%�
&��[Yr�,,��]r�
��!���~NGjo��TIZd=lY�#�D�T�6�kg5������/��_�����f�{�D.Qr���w��&��>&x����7&s��d��n����	6���v�.�#o�T���0�����e�����~��9)^Z���sv�,7g����wN�0�\�\%%��P���s��l%t����ES�x�^�^+f�yZk�U6��~�����w^���\&�{59�j]��e�����:���b�tm`?�L��dy��W>�8����� ������b��Y"������� �3;D9G�;0D��wZ
�����m�ks;G���t��R���f6���of!��/��w���u3�O��7�y����Y�b���Z��v�~�F��5���L#��\�-�qCo[<���,O�g��+��8��5Os��`�����jW��,�'�h�z���"Z.�D���`
���B���z���������3k�Q�}T�{�h���
���>G����?<<,;���/V��fP
��iv�i�nMV^/J�l�m[����Y�?������EoJo�*�������8��V�H:���Q
1���3��������
��hH��R���������db[-���M��3b����6+��$~��G���0e�
�F�[A�����*����U	�U �O�{�9����&���� � �6Qd,�!E��O��
\��<�������EL�!��-�������$�V�*�TKLK/��-�A�aBl�>4*5d����b���/b�%Prx��o���^�Wx��BK5����U�}�|�F���t@������)�fr�9��8�(��/�r��r����/�Q{U���g�Y[]{a���uhu�bH!�.�������>.������gK��r�9���!h��C�f`v���>�#���U|U��H'���J���P������"�QPms��o����C���iup��������y�����q?�Y�k�r��5.��A_�O��"��Aj��u���/����h�"(��-��0�M�O�@XH ,wb�		a��0�oO��F��wZ�0���Ua��+L�I��������g�>�~�`A�������/�0]���t�m*L�r%�%�~P���Ka�.�0]�W��&��0]�������t}�
��oPa���*L���0]������Jw�0~��������+S�PVW%�1Y��&�����.�8+>k\G�5��r��j$������Q�����F��st���q`�nH:1��!��7���X}}�vl2��s�O�W��'l0��rD��"�n���-��U4ZO�HF��4���7�`���������t���WH�	�t
�^&^�����h�W����m�������m�=����O���t����s�p[��|���z���'�����[��V�!�Y�W�N������)O@��$�D�H�m��Y�������H��$�!~t�j���������D�W�[�m�]�M�Sbl�7����E$�<1z����hk�?�>������lI`��������]���a���t=�8Ai��QmppV����	g��eEZ4z�o���U��3-�L������$t��"�c��
�Q��X���%���L�g���,�_��j|�x��U��1�������*/]�H��]b4�d�Z��NMKT�#�$����e7
�	I������X("�o��Z�g�q�]��W�wD?=1�j{���E�� Q���5�l�Y��
���K��M8����x��Z������EIe�K���XU�_�p���Z�ZI�����������[�d�@=�{[A;QNq�����}�_$G��h��VR	���l���C�x�,��m���G�l�r��f����b
w1&����V��*Z��O*+���.��/O^	��^	�'�KW3N�VB~c[Gv �7A������H�������w3-� ���+.��!W1`=�v.�qA�&+�`&���<��(�gj����������~x_�ZpU�#��'�7�kp��d��C-D�r8x���H����"H'�$������<)Z������u���V	��&������M�;zNq��]����S���g�st���C��9�	-h�NhY�+-�	-KIh���\�K�`����BE���:
���WZO�,��������
��.d�{1-1����&�d�&pz��.�#L�
���g���v��<�^�i�������#����Ty���p	���}�Q]����[���<:Kc0�#�Jc�3�]�q���_z$YB�������|r�����O^8=eg.yO����|8_���G�w��c8YGh���4L}C��>�?����2MB��an:���G`�E"M1����0�1j��3�
B{_���Ap�N�/.����Hv�e��6�ad���%p�Q�tw�1*����17�L��;5�������I�h����U�������+��aT�76���2`O\��\:��&	�����#���� ���������H��`�����������qN�S*�7?����`��>�[�Vt)�y�s������b��M��������HJ�����}N�J�P��=�������Q�%	Yn{����r����N�f��	�����*�a�����)�0�T���TYf��'����=	�J�e)p�+-,����p��YfD?�
�)oS�lms�+�z��v�������` ���0���Z���!n9	��O���������2@����P,rh����1���s��\�/��sp����[vT4�����@��t���k�R�`+����d{i����+Nb�������x�����%8"��[Y�N�`�Z[%�:\H{o�L���P8X��8VdL��tV"��=��{u"��Z��,�k�\�Vv��qZzl��QT��Ne�"Q���W^s�X�����=�n���Q7��R����[�g�k�~�O�1d��b�@�H5��z�z����yEPwM��UA��z���o4��=}f��\�7A�L���FW3�
s����-e��h4�z}��Z�f��h��PX�j
qy��W��!��<�t�ujUK&�o.5�N�[�D�8���B�nu�[G6��A,K��o��p&���$
bv����W����������wp���G�����J������A^m5��_{��M�k��Qy��9P��ok�������O|�iy��4�V���tZ�?��}`��Sd��������.��7z�^�n���E~4
:-��Q��^3j��(�7�C�����(d?p�r���a��S���|5�����o#��:����$����E�U��/8�
�������+�������|��V�/���?mt�6v�h���u�_�`�������������2�M��p-O�f����o���dt1H,f�U��������Ov�����N�� T���}
��y�u���F(/������P���u����8�e�t�Z�h�e���U�#��^C�4��FM9�az9���x���w�p�dk.�/�D7���I�k]C�?��?|��{1�K�}L%�}?�M�O ��
�tO��
���	�W�0��>��2�x������*VAu�W�@��r��l�
�ID����O�������.���4���w4��a�4���b�'�)�\=��?��[�L+|�Na5/"�#4�<��e���{�VU�_�+��8�]}�$@���0\{?s��+�0�
�Y���f�[����q�y��.��5{X�N��}��|�?z���_��tz�p�������*�\	{���H/�T-w���I-����=�!�a�A�-�^+0L�D���Ab���&��K��������Y�J��.���?�(��& �g�1���F>f��|��tZ�A�������h�/^��*2�'B0=Ep�GHZ�3x��[��4�>�w<i���-i]ZD#� M�R�����=R���0��$�8b#i%pO� v�T0
9a��i�?����D��0��p�>��O@�8e��-�&��?�j�m�D�����]����g����;��>F>�����}�G����1Z t��ZM�I]j�0���d_�P�������9g*��C Yx����I��������P����^��O��M� �����q4�
xN�nvi�-o�Y_
k/��?1�������b�|#��|e�F*�G����s�M>�_Oa`d��O��u�NPB����<��pz���O��n��+t�
�?�X�����x�U}3�{���/�[�P��GDg���c��1>�/��10+�uC|jp�s>	���B'4�@���F���D���80�*�t�f<�4�����Kg��8�4V������7�It����!���.R��e�HDn���a#����l��;�)g�=��q��DJ�1�� B[�,|]����I4
�p ��&@�@\�/9���q >����}(�������E������p=\���p=\���p=\���p=\���p=\���p=\���p=\���p=\���p=\���p=\����71�Qp

#42

Konstantin Knizhnik

k.knizhnik@postgrespro.ru

about 8 years ago

In reply to: Jeevan Chalke (#40)

Re: [HACKERS] Partition-wise aggregation/grouping

On 15.11.2017 13:35, Jeevan Chalke wrote:

As explained by Ashutosh Bapat in reply
/messages/by-id/CAFjFpRdpeMTd8kYbM_x0769V-aEKst5Nkg3+coG=8ki7s8Zqjw@mail.gmail.com
we cannot rely on just aggtype==aggtranstype.

Obviously this check (aggtype==aggtranstype) is not correct criteria for
all user defined aggregates.
I just did it as temporary work around for standard aggregates.

However, I have tried pushing partial aggregation over remote server
and also
submitted a PoC patch here:
/messages/by-id/CAM2+6=UakP9+TSJuh2fbhHWNJc7OYFL1_gvu7mt2fXtVt6GY3g@mail.gmail.com

I have later removed these patches from Partition-wise-Aggregation
patch set
as it is altogether a different issue than this mail thread. We might
need to
discuss on it separately.

...

Interesting idea of "asynchronous append". However, IMHO it deserves
its own
email-chain.

The main problem IMHO is that there are a lot of different threads and
patches related with this topic:(
And it is very difficult to combine all of them together to achieve the
final goal: efficient execution of OLAP queries on sharded table.
It will be nice if somebody who is making the most contribution in this
direction can somehow maintain it...
I just faced with particular problem with our pg_shardman extension and
now (thanks to your patch) I have some working solution for it.
But certainly I prefer to have this support in mainstream version of
Postgres.

There are two open questions, which I wan to discuss (sorry, may be one
again this is not the right thread for it):

1. Parallel append and FDW/postgres_fdw: should FDW support parallel
scan and do we really need it to support concurrent execution of query
on local and remote partitions?
"Asynchronous append" partly solves this problem, but only for remote
partitions. I do not completely understand all complexity of alternative
approaches.

2. Right now partition-wise aggregation/grouping works only for tables
partitioned using new PG 10 partitioning mechanism. But it doesn't work
for inherited tables, although
there seems to be not so much difference between this two cases. Do you
think that sometimes it will be also supported for standard inheritance
mechanism or there is no sense in it?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#43

David Rowley

david.rowley@2ndquadrant.com

about 8 years ago

In reply to: Konstantin Knizhnik (#42)

Re: [HACKERS] Partition-wise aggregation/grouping

On 16 November 2017 at 05:57, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

The main problem IMHO is that there are a lot of different threads and
patches related with this topic:(
And it is very difficult to combine all of them together to achieve the
final goal: efficient execution of OLAP queries on sharded table.
It will be nice if somebody who is making the most contribution in this
direction can somehow maintain it...
I just faced with particular problem with our pg_shardman extension and now
(thanks to your patch) I have some working solution for it.
But certainly I prefer to have this support in mainstream version of
Postgres.

I don't think it's fair to be asking about additional features on this
thread. It seems to me you're asking about two completely separate
features, with the aim of trying to solve your own problems.

It also looks to me that Jeevan has been clear on what his goals are
for this patch. Perhaps what you're asking for is a logical direction
to travel once this patch is committed, so I think, probably, the best
way to conduct what you're after here is to either:

a) Wait until this is committed and spin up your own thread about
you're proposed changes to allow the PARTIAL aggregate to be pushed
into the foreign server, or;
b) Spin up your own thread now, with reference to this patch as a
prerequisite to your own patch.

I agree that what you're talking about is quite exciting stuff, but
please, let's not talk about it here.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#44

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: David Rowley (#43)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Nov 16, 2017 at 6:02 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:

On 16 November 2017 at 05:57, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

The main problem IMHO is that there are a lot of different threads and
patches related with this topic:(
And it is very difficult to combine all of them together to achieve the
final goal: efficient execution of OLAP queries on sharded table.
It will be nice if somebody who is making the most contribution in this
direction can somehow maintain it...
I just faced with particular problem with our pg_shardman extension and now
(thanks to your patch) I have some working solution for it.
But certainly I prefer to have this support in mainstream version of
Postgres.

I don't think it's fair to be asking about additional features on this
thread. It seems to me you're asking about two completely separate
features, with the aim of trying to solve your own problems.

It also looks to me that Jeevan has been clear on what his goals are
for this patch. Perhaps what you're asking for is a logical direction
to travel once this patch is committed, so I think, probably, the best
way to conduct what you're after here is to either:

a) Wait until this is committed and spin up your own thread about
you're proposed changes to allow the PARTIAL aggregate to be pushed
into the foreign server, or;
b) Spin up your own thread now, with reference to this patch as a
prerequisite to your own patch.

I agree that what you're talking about is quite exciting stuff, but
please, let's not talk about it here.

+1 for all that.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#45

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Jeevan Chalke (#41)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Nov 15, 2017 at 5:31 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

OK. Done in the attached patch set.

I have rebased all my patches on latest HEAD which is at
7518049980be1d90264addab003476ae105f70d4

Thanks

These are review comments for the last set and I think most of them
apply to the new set as well.

Patches 0001 - 0005 refactoring existing code. I haven't
reviewed them in detail, checking whether we have missed anything in moving the
code, but they mostly look fine.

Comments on 0006
 /*
+ * cost_append
+ *      Determines and returns the cost of an Append node.
+ *
... clipped portion
+
+    /* Add Append node overhead. */
+    run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+

I am wondering whether it's really worth creating a new function for a single
line addition to create_append_path(). I think all we need to change in
create_append_path() is add (cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR *
tuples) to path->total_cost.

+    /* Add MergeAppend node overhead like we do it for the Append node */
+    run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+

With this change the following comment is no more true. Please remove it.
* extracted tuple. We don't charge cpu_tuple_cost because a MergeAppend
* node doesn't do qual-checking or projection, so it has less overhead
* than most plan nodes.
*/

+/*
+ * Arbitrarily use 50% of the cpu_tuple_cost to cost append node. Note that

May be reword it as " ... to cost per tuple processing by an append node ..."

+ * this value should be multiplied with cpu_tuple_cost wherever applicable.
+ */
+#define DEFAULT_APPEND_COST_FACTOR 0.5

I am wondering whether we should just define
#define APPEND_TUPLE_COST (cpu_tuple_cost * 0.5)
and use this macro everywhere. What else use DEFAULT_APPEND_COST_FACTOR would
have other than multiplying with cpu_tuple_cost?

 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM
plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') =
t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN
---------------------------------------------------------------------------------------
+                                   QUERY PLAN
+--------------------------------------------------------------------------------
  Sort
    Sort Key: t1.c, t3.c
    ->  HashAggregate
          Group Key: t1.c, t2.c, t3.c
-         ->  Result
+         ->  Hash Join
+               Hash Cond: (t1.c = t2.c)
                ->  Append
-                     ->  Hash Join
-                           Hash Cond: (t1.c = t2.c)

That's sad. Interestingly this query has an aggregate, so the plan will use
partition-wise join again when partition-wise aggregation patch will be
applied. So may be fine.

- Append  (cost=0.00..0.04 rows=2 width=32)
+ Append  (cost=0.00..0.05 rows=2 width=32)

- Append  (cost=0.00..0.04 rows=2 width=4)
+ Append  (cost=0.00..0.05 rows=2 width=4)

We do have some testcases which print costs. Interesting :). I don't have
any objection to this change.

Comments on 0007

+       <para>
+        Enables or disables the query planner's use of partition-wise grouping
+        or aggregation, which allows  If partition-wise aggregation
does not result in the
+        cheapest path, it will still spend time in creating these paths and
+        consume memory which increase linearly with the number of partitions.
+        The default is <literal>off</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
May be we should word this in the same manner as partition-wise join like

Enables or disables the query planner's use of partition-wise grouping
or aggregation, which allows aggregation or grouping on a partitioned
tables to be spread across the partitions. If <literal>GROUP
BY<literal> clause includes partition keys, the rows are aggregated at
each partition. Otherwise, partial aggregates computed for each
partition are required to be combined. Because partition-wise
aggregation/gropuing can use significantly more CPU time and memory
during planning, the default is <literal>off</literal>.

+
+Partition-wise aggregates/grouping
+----------------------------------

... clipped patch

+In above plan, aggregation is performed after append node which means that the
+whole table is an input for the aggregation node. However, with partition-wise
+aggregation, same query will have plane like:

s/plane/plan/

+ Append

... clipped patch

+PartialAggregate stage greatly reduces the number of groups and lose if we have
+lots of small groups.

To keep the discussion brief, I suggest we rewrite this paragraph as

----
If GROUP BY clause has all partition keys, all the rows that belong to a given
group come from a single partition and thus aggregates can be finalized
separately for each partition. When the number of groups is far lesser than the
number of rows being grouped, as usually is the case, the number of rows
processed by an Append node reduces apart from reducing the size of the hash
table or size of the data to be sorted. This usually improves efficiency of the
query. If GROUP BY doesn't contain all the partition keys, partial
aggregates can be computed for
each partition followed by combining partial aggregates from one or more
partitions belonging to the same group to compute complete aggregate for each
group. This improves efficiency of the query if the number of groups is far
less than the number of rows produced by the scan underneath.
---

I am not sure whether we should be discussing why this technique performs
better or when it performs better. We don't have similar discussion for
partition-wise join. That paragraph just describes the technique and may be we
want to do the same here.

+ *
+ * extra is the additional information required when we are doing aggregation
+ * or grouping below the append node. In case of partial partition-wise
+ * aggregation on a child node, we need to compute finalized step after the
+ * append, which cannot be done in this function. And thus if we have non-NULL
+ * value for extra, we call create_partition_agg_paths() to create an append
+ * node and finalization, if any.

May be we want to just say "extra provides more information about the
partitioned aggregation/grouping e.g path target, whether to use partial
aggregate and so on." When present we call create_partition_agg_paths() to add
paths for partition-wise aggregatges.

-        add_path(rel, (Path *) create_append_path(rel, subpaths,
-                                                  rel->reltarget, NULL, 0,
-                                                  partitioned_rels));
+    {
+        if (extra)
+            create_partition_agg_paths(root, rel, subpaths, NIL,
+                                       NIL, NULL, 0,
+                                       partitioned_rels, extra);
+        else
+            add_path(rel, (Path *) create_append_path(rel, subpaths,
+                                                      rel->reltarget, NULL, 0,
+                                                      partitioned_rels));
+    }

I am wondering whether we could write a function to call appropriate one out of
create_append_path(), create_partition_agg_paths() or
create_merge_append_path() based on the presence of extra and/or pathkeys and
use it everywhere such a change is made. I don't know whether that will be
worth the code. But there are a handful places where such diffs are required.

-
-    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys, NULL);
+    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
+                                   IS_OTHER_REL(best_path->subpath->parent) ?
+                                   best_path->path.parent->relids : NULL);

While I can guess why this change is required, it may be better to separate it
into a patch of its own and adding some explanation in the commit message, for
other reviewers.

+    /* Copy input rels's relids to grouped rel */
+    grouped_rel->relids = input_rel->relids;

I am fine with this change, but Tom may not agree [1]/messages/by-id/CAFjFpRdUz6h6cmFZFYAngmQAX8Zvo+MZsPXidZ077h=gp9bvQw@mail.gmail.com. May be we should get his
opinion on this one.

     /*
+     * If input relation is partitioned, check if we can perform
+     * partition-wise grouping and/or aggregation.
+     */

Just like partition-wise join a concise "Apply partition-wise aggregation
technique, if possible." would suffice.

     dNumPartialGroups = get_number_of_groups(root,
                                              cheapest_partial_path->rows,
                                              gd,
-                                             parse->targetList);
+
make_tlist_from_pathtarget(target));
Can we guarantee that the output of make_tlist_from_pathtarget() will be same
as translation of parse->targetList for the given child? Even if not, may be
it's fine to pass slightly different tlist to get_number_of_groups() since it
doesn't depend upon the exact shape but right group column references.
Nonetheless something to test and verify.

  *
- * Determines whether parallel grouping and/or aggrgation is possible, or not.
+ * Determines whether parallel grouping and/or aggregation is possible, or not.
  * Returns true when possible, false otherwise.

Does this hunk belong to one of the refactoring patches or as a separate patch
correcting a typo?

+/*
+ * try_partition_wise_grouping
+ *
+ * If the input relation is partitioned and the partition keys are part of the
+ * group by clauses, each partition produces a different set of groups.
+ * Aggregates within each such group can be computed partition-wise. This

While these sentences are correct, I think the reason why we could compute an
aggregate at the level of each partition is because rows from a given group
belong to a single partition. So, I guess, we have to reword this as

"If the partition keys of input relation are part of group by clause, all the
rows belonging to a given group come from a single partition, each partition
producing a different set of groups. This allows aggregation/grouping over a
partitioned relation to be broken down into aggregation/grouping on each
partition.

If group by clause does not contain all the partition keys, rows from a given
group may be spread across multiple partitions. In that case, we can combine
partial aggregates for a given group across partitions to produce the final
aggregate for a that group "

+ * might be optimal because of presence of suitable paths with pathkeys or
+ * because the hash tables for most of the partitions fit into the memory.
+ * However, if partition keys are not part of the group by clauses, then we
+ * still able to compute the partial aggregation for each partition and then
+ * finalize them over append. This can either win or lose. It may win if the
+ * PartialAggregate stage greatly reduces the number of groups and lose if we
+ * have lots of small groups.

I have not seen prologue of a function implementing a query optimization
technique explain why that technique improves performance. So I am not sure
whether the comment should include this explanation. One of the reasons being
that the reasons why a technique works might change over the period of time
with the introduction of other techniques, thus obsoleting the comment. But
may be it's good to have it here.

+    /*
+     * Grouping sets plan does not work with an inheritance subtree (see notes
+     * in create_groupingsets_plan). Thus do not handle grouping sets here.
+     */
+    if (query->groupingSets || gd)
+        return;

Even if that restriction is lifted, we won't be able to compute
"whole" grouping sets
for each partition, since grouping sets implies multiple group by clauses, each
of which may not have all partition keys. Those sets which have all partition
keys will be computed completely for each partition, but others will require
partial aggregation. I guess, we will need to apply partition-wise aggregation
at each derived group by clause and not as a whole-sale strategy.

Anyway, it doesn't look like a good idea to pass an argument (gd) only to
return from that function in case of its presence. May be we should handle it
outside this function.

+
+    /* Nothing to do, if the input relation is not partitioned. */
+    if (!input_rel->part_scheme)
+        return;
+
+    Assert(input_rel->part_rels);

For a join between two partitioned tables with one of them being dummy
relation, would have part_scheme set but not part_rels (See
try_partition_wise_join()). This assertion would
fail in such a case. Have you tested the case? May be we should just test if
input_rel->part_rels exists similar to generate_partition_wise_join_paths().
Also, how is a dummy input relation is handled in this function? Do we need to
handle?

+    nparts = input_rel->nparts;
+    part_rels = (RelOptInfo **) palloc(nparts * sizeof(RelOptInfo *));
+    grouped_rel->part_rels = part_rels;

For a partial aggregation, we can't say that the child rels produced here are
partitions of the top grouped relation, so setting part_rels looks wrong. We
should set this only when a full aggregate is obtained from each partition.

+        scanjoin_target =
copy_pathtarget(input_rel->cheapest_startup_path->pathtarget);
+        scanjoin_target->exprs = (List *) adjust_appendrel_attrs(root,
+
(Node *) scanjoin_target->exprs,
+                                                                 nappinfos,
+                                                                 appinfos);

Why can't we use input_child_rel->pathtarget? It should be same as the
translation of its parent's path target. I probably understand that's because
the input rel's path targets have been changed after the underlying join was
planned, a step which is not applied to the individual children. May be add a
comment here?

+        child_target->exprs = (List *) adjust_appendrel_attrs(root,
+                                                              (Node
*) target->exprs,
+                                                              nappinfos,
+                                                              appinfos);
+        partial_target = make_partial_grouping_target(root, target);
+        partial_target->exprs = (List *) adjust_appendrel_attrs(root,
+                                                                (Node
*) partial_target->exprs,
+                                                                nappinfos,
+                                                                appinfos);

We need both of these steps for any aggregate since parallel paths will compute
parial paths anyway. If that's correct, may be we should add a comment?

+ extra.inputRows = 0; /* Not needed at child paths creation */

Why? Comment should be on its own line.

+        if (!create_child_grouping_paths(root, input_child_rel, agg_costs, gd,
+                                         &extra))
+        {
+            /* Could not create path for childrel, return */
+            pfree(appinfos);
+            return;
+        }

Can we detect this condition and bail out even before planning any of the
children? It looks wasteful to try to plan children only to bail out in this
case.

+    /* Nothing to do if we have no live children */
+    if (live_children == NIL)
+        return;

A parent relation with all dummy children will also be dummy. May be we should
mark the parent dummy case using mark_dummy_rel() similar to
generate_partition_wise_join_paths().

+/*
+ * have_grouping_by_partkey
+ *

Somehow this name sounds like it would return true when GROUP BY contains only
partition key. May be rename as group_by_has_partkey? to indicate the

+ * Returns true, if partition keys of the given relation are part of the
+ * GROUP BY clauses, false otherwise.

Reword as " ... if all the partition keys of ... "

+static bool
+have_grouping_by_partkey(RelOptInfo *input_rel, PathTarget *target,
+                         List *groupClause)
+{
+    List       *tlist = make_tlist_from_pathtarget(target);
+    List       *groupexprs = get_sortgrouplist_exprs(groupClause, tlist);

Have we tested the case with multi-level partitioned table and children with
different order of partition key columns?

+        partexprs = input_rel->partexprs ? input_rel->partexprs[cnt] : NIL;
+
+        /* Rule out early, if there are no partition keys present */
+        if (partexprs == NIL)
+            return false;

If input_rel->partexprs is NIL, we should "bail" out even before the loop
starts.

+        foreach(lc, partexprs)
+        {
+            Expr       *partexpr = lfirst(lc);
+
+            if (list_member(groupexprs, partexpr))
+            {
+                found = true;
+                break;
+            }
+        }

This looks like a useful piece of general functionality
list_has_intersection(), which would returns boolean instead of the whole
intersection. I am not sure whether we should add that function to list.c and
use here.

+ * If none of the partition key matches with any of the GROUP BY

Reword as "... the partition key expressions match with ...."

This isn't a full review of 0007, but I think it covers most of the new
functionality.

[1]: /messages/by-id/CAFjFpRdUz6h6cmFZFYAngmQAX8Zvo+MZsPXidZ077h=gp9bvQw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#46

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Ashutosh Bapat (#45)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Nov 17, 2017 at 7:24 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

+ * this value should be multiplied with cpu_tuple_cost wherever applicable.
+ */
+#define DEFAULT_APPEND_COST_FACTOR 0.5
I am wondering whether we should just define
#define APPEND_TUPLE_COST (cpu_tuple_cost * 0.5)
and use this macro everywhere. What else use DEFAULT_APPEND_COST_FACTOR would
have other than multiplying with cpu_tuple_cost?

-1. If you wrap it in a macro like that, future readers of the code
will have to go look up what the macro does. If you just multiply by
DEFAULT_APPEND_COST_FACTOR it will be clear that's what being used is
a multiple of cpu_tuple_cost.

I am not sure whether we should be discussing why this technique performs
better or when it performs better. We don't have similar discussion for
partition-wise join. That paragraph just describes the technique and may be we
want to do the same here.

+1.

+ * might be optimal because of presence of suitable paths with pathkeys or
+ * because the hash tables for most of the partitions fit into the memory.
+ * However, if partition keys are not part of the group by clauses, then we
+ * still able to compute the partial aggregation for each partition and then
+ * finalize them over append. This can either win or lose. It may win if the
+ * PartialAggregate stage greatly reduces the number of groups and lose if we
+ * have lots of small groups.
I have not seen prologue of a function implementing a query optimization
technique explain why that technique improves performance. So I am not sure
whether the comment should include this explanation. One of the reasons being
that the reasons why a technique works might change over the period of time
with the introduction of other techniques, thus obsoleting the comment. But
may be it's good to have it here.

+1 for keeping it.

+ extra.inputRows = 0; /* Not needed at child paths creation */

Why? Comment should be on its own line.

Comments on same line are fine if they are short enough.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#47

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#45)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Nov 17, 2017 at 5:54 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Wed, Nov 15, 2017 at 5:31 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

OK. Done in the attached patch set.

I have rebased all my patches on latest HEAD which is at
7518049980be1d90264addab003476ae105f70d4

Thanks

These are review comments for the last set and I think most of them
apply to the new set as well.

Patches 0001 - 0005 refactoring existing code. I haven't
reviewed them in detail, checking whether we have missed anything in
moving the
code, but they mostly look fine.

Thanks.

Comments on 0006
/*
+ * cost_append
+ *      Determines and returns the cost of an Append node.
+ *
... clipped portion
+
+    /* Add Append node overhead. */
+    run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
I am wondering whether it's really worth creating a new function for a
single
line addition to create_append_path(). I think all we need to change in
create_append_path() is add (cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR *
tuples) to path->total_cost.

Agree. However, there was ab existing comment in create_append_path() saying
"We don't bother with inventing a cost_append(), but just do it here", which
implies at sometime in future we may need it; so why not now where we are
explicitly costing for an append node. Having a function is good so that,
if required in future, we need update in only this function.
Let me know if you think otherwise, I make those changes in next patchset.

+    /* Add MergeAppend node overhead like we do it for the Append node */
+    run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
+
With this change the following comment is no more true. Please remove it.
* extracted tuple. We don't charge cpu_tuple_cost because a
MergeAppend
* node doesn't do qual-checking or projection, so it has less overhead
* than most plan nodes.
*/

This was already fixed in v7.

+/*

+ * Arbitrarily use 50% of the cpu_tuple_cost to cost append node. Note
that

May be reword it as " ... to cost per tuple processing by an append node
..."

Done.

+ * this value should be multiplied with cpu_tuple_cost wherever
applicable.
+ */
+#define DEFAULT_APPEND_COST_FACTOR 0.5
I am wondering whether we should just define
#define APPEND_TUPLE_COST (cpu_tuple_cost * 0.5)
and use this macro everywhere. What else use DEFAULT_APPEND_COST_FACTOR
would
have other than multiplying with cpu_tuple_cost?

As suggested by Robert, I have renamed it to APPEND_CPU_COST_MULTIPLIER in
v7 patchset.
Also, retained the #define for just multiplier as suggested by Robert.

-- test partition matching with N-way join
EXPLAIN (COSTS OFF)
SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM
plt1 t1, plt2 t2, plt1_e t3 WHERE t1.c = t2.c AND ltrim(t3.c, 'A') =
t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN
------------------------------------------------------------
---------------------------
+                                   QUERY PLAN
+-----------------------------------------------------------
---------------------
Sort
Sort Key: t1.c, t3.c
->  HashAggregate
Group Key: t1.c, t2.c, t3.c
-         ->  Result
+         ->  Hash Join
+               Hash Cond: (t1.c = t2.c)
->  Append
-                     ->  Hash Join
-                           Hash Cond: (t1.c = t2.c)

That's sad. Interestingly this query has an aggregate, so the plan will use
partition-wise join again when partition-wise aggregation patch will be
applied. So may be fine.

Yep. I have modified this testcase and enabled partition-wise aggregation
before this test, so that we will see the desired plan.

- Append  (cost=0.00..0.04 rows=2 width=32)
+ Append  (cost=0.00..0.05 rows=2 width=32)
- Append  (cost=0.00..0.04 rows=2 width=4)
+ Append  (cost=0.00..0.05 rows=2 width=4)
We do have some testcases which print costs. Interesting :). I don't have
any objection to this change.

OK. Thanks.

Comments on 0007
+       <para>
+        Enables or disables the query planner's use of partition-wise
grouping
+        or aggregation, which allows  If partition-wise aggregation
does not result in the
+        cheapest path, it will still spend time in creating these paths
and
+        consume memory which increase linearly with the number of
partitions.
+        The default is <literal>off</>.
+       </para>
+      </listitem>
+     </varlistentry>
+
May be we should word this in the same manner as partition-wise join like
Enables or disables the query planner's use of partition-wise
grouping
or aggregation, which allows aggregation or grouping on a
partitioned
tables to be spread across the partitions. If <literal>GROUP
BY<literal> clause includes partition keys, the rows are
aggregated at
each partition. Otherwise, partial aggregates computed for each
partition are required to be combined. Because partition-wise
aggregation/gropuing can use significantly more CPU time and memory
during planning, the default is <literal>off</literal>.

Done. Thanks for the new wordings.

+
+Partition-wise aggregates/grouping
+----------------------------------

... clipped patch

+In above plan, aggregation is performed after append node which means
that the
+whole table is an input for the aggregation node. However, with
partition-wise
+aggregation, same query will have plane like:

s/plane/plan/

Oops. Fixed.

+ Append

... clipped patch
+PartialAggregate stage greatly reduces the number of groups and lose if
we have
+lots of small groups.
To keep the discussion brief, I suggest we rewrite this paragraph as

----
If GROUP BY clause has all partition keys, all the rows that belong to a
given
group come from a single partition and thus aggregates can be finalized
separately for each partition. When the number of groups is far lesser
than the
number of rows being grouped, as usually is the case, the number of rows
processed by an Append node reduces apart from reducing the size of the
hash
table or size of the data to be sorted. This usually improves efficiency
of the
query. If GROUP BY doesn't contain all the partition keys, partial
aggregates can be computed for
each partition followed by combining partial aggregates from one or more
partitions belonging to the same group to compute complete aggregate for
each
group. This improves efficiency of the query if the number of groups is far
less than the number of rows produced by the scan underneath.
---

I am not sure whether we should be discussing why this technique performs
better or when it performs better. We don't have similar discussion for
partition-wise join. That paragraph just describes the technique and may
be we
want to do the same here.

OK.
I have removed the text explaining when it performs better.
Please have a look over new text and let me know your views.

+ *
+ * extra is the additional information required when we are doing
aggregation
+ * or grouping below the append node. In case of partial partition-wise
+ * aggregation on a child node, we need to compute finalized step after
the
+ * append, which cannot be done in this function. And thus if we have
non-NULL
+ * value for extra, we call create_partition_agg_paths() to create an
append
+ * node and finalization, if any.
May be we want to just say "extra provides more information about the
partitioned aggregation/grouping e.g path target, whether to use partial
aggregate and so on." When present we call create_partition_agg_paths() to
add
paths for partition-wise aggregatges.

Done.

-        add_path(rel, (Path *) create_append_path(rel, subpaths,
-                                                  rel->reltarget, NULL, 0,
-                                                  partitioned_rels));
+    {
+        if (extra)
+            create_partition_agg_paths(root, rel, subpaths, NIL,
+                                       NIL, NULL, 0,
+                                       partitioned_rels, extra);
+        else
+            add_path(rel, (Path *) create_append_path(rel, subpaths,
+                                                      rel->reltarget,
NULL, 0,
+                                                      partitioned_rels));
+    }
I am wondering whether we could write a function to call appropriate one
out of
create_append_path(), create_partition_agg_paths() or
create_merge_append_path() based on the presence of extra and/or pathkeys
and
use it everywhere such a change is made. I don't know whether that will be
worth the code. But there are a handful places where such diffs are
required.

Done.
Added function named add_append_path() which does the same. Function name
seems too generic, it will be good if you suggest few.

-
-    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
NULL);
+    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
+                                   IS_OTHER_REL(best_path->subpath->parent)
?
+                                   best_path->path.parent->relids : NULL);
While I can guess why this change is required, it may be better to
separate it
into a patch of its own and adding some explanation in the commit message,
for
other reviewers.

Done.
Tried adding comments. See whether it make sense or need further
improvements.

+    /* Copy input rels's relids to grouped rel */
+    grouped_rel->relids = input_rel->relids;
I am fine with this change, but Tom may not agree [1]. May be we should
get his
opinion on this one.

Yep. Agree.
This mainly required for FDW as all_baserels do not have relids for the
child relations. Another solution I think of is to have say, rel->fs_relids
and set that appropriately and use it in create_foreignscan_plan().

/*
+     * If input relation is partitioned, check if we can perform
+     * partition-wise grouping and/or aggregation.
+     */
Just like partition-wise join a concise "Apply partition-wise aggregation
technique, if possible." would suffice.

Done.

dNumPartialGroups = get_number_of_groups(root,
cheapest_partial_path->rows,
gd,
-                                             parse->targetList);
+
make_tlist_from_pathtarget(target));
Can we guarantee that the output of make_tlist_from_pathtarget() will be
same
as translation of parse->targetList for the given child? Even if not, may
be
it's fine to pass slightly different tlist to get_number_of_groups() since
it
doesn't depend upon the exact shape but right group column references.
Nonetheless something to test and verify.

We are interested here to get the group expressions. get_number_of_groups()
fetches the group expressions from tlist by checking tle->ressortgroupref
which is presumably same as that of path target's sortgrouprefs. So I don't
see any issue here.

*
- * Determines whether parallel grouping and/or aggrgation is possible, or
not.
+ * Determines whether parallel grouping and/or aggregation is possible,
or not.
* Returns true when possible, false otherwise.
Does this hunk belong to one of the refactoring patches or as a separate
patch
correcting a typo?

Oops. Moved to the appropriate refactoring patch.

+/*
+ * try_partition_wise_grouping
+ *
+ * If the input relation is partitioned and the partition keys are part
of the
+ * group by clauses, each partition produces a different set of groups.
+ * Aggregates within each such group can be computed partition-wise. This
While these sentences are correct, I think the reason why we could compute
an
aggregate at the level of each partition is because rows from a given group
belong to a single partition. So, I guess, we have to reword this as

"If the partition keys of input relation are part of group by clause, all
the
rows belonging to a given group come from a single partition, each
partition
producing a different set of groups. This allows aggregation/grouping over
a
partitioned relation to be broken down into aggregation/grouping on each
partition.

If group by clause does not contain all the partition keys, rows from a
given
group may be spread across multiple partitions. In that case, we can
combine
partial aggregates for a given group across partitions to produce the final
aggregate for a that group "

Done. Thanks.

+ * might be optimal because of presence of suitable paths with pathkeys or
+ * because the hash tables for most of the partitions fit into the memory.
+ * However, if partition keys are not part of the group by clauses, then
we
+ * still able to compute the partial aggregation for each partition and
then
+ * finalize them over append. This can either win or lose. It may win if
the
+ * PartialAggregate stage greatly reduces the number of groups and lose
if we
+ * have lots of small groups.
I have not seen prologue of a function implementing a query optimization
technique explain why that technique improves performance. So I am not sure
whether the comment should include this explanation. One of the reasons
being
that the reasons why a technique works might change over the period of time
with the introduction of other techniques, thus obsoleting the comment.
But
may be it's good to have it here.

Yep. Retained.

+    /*
+     * Grouping sets plan does not work with an inheritance subtree (see
notes
+     * in create_groupingsets_plan). Thus do not handle grouping sets
here.
+     */
+    if (query->groupingSets || gd)
+        return;
Even if that restriction is lifted, we won't be able to compute
"whole" grouping sets
for each partition, since grouping sets implies multiple group by clauses,
each
of which may not have all partition keys. Those sets which have all
partition
keys will be computed completely for each partition, but others will
require
partial aggregation. I guess, we will need to apply partition-wise
aggregation
at each derived group by clause and not as a whole-sale strategy.

Done.

Anyway, it doesn't look like a good idea to pass an argument (gd) only to
return from that function in case of its presence. May be we should handle
it
outside this function.

Well, I would like to have it inside the function itself. Let the function
itself do all the necessary checking rather than doing some of them outside.

+
+    /* Nothing to do, if the input relation is not partitioned. */
+    if (!input_rel->part_scheme)
+        return;
+
+    Assert(input_rel->part_rels);
For a join between two partitioned tables with one of them being dummy
relation, would have part_scheme set but not part_rels (See
try_partition_wise_join()). This assertion would
fail in such a case. Have you tested the case? May be we should just test
if
input_rel->part_rels exists similar to generate_partition_wise_join_
paths().

Yep. This was already fixed in v7 and also has a covering testcase.

Also, how is a dummy input relation is handled in this function? Do we need

to
handle?

Yes, we need to handle. Need to return without doing PWA when input
relation itself is dummy.
Added covering testcase for it.

+ nparts = input_rel->nparts;

+    part_rels = (RelOptInfo **) palloc(nparts * sizeof(RelOptInfo *));
+    grouped_rel->part_rels = part_rels;
For a partial aggregation, we can't say that the child rels produced here
are
partitions of the top grouped relation, so setting part_rels looks wrong.
We
should set this only when a full aggregate is obtained from each partition.

Done.

+        scanjoin_target =
copy_pathtarget(input_rel->cheapest_startup_path->pathtarget);
+        scanjoin_target->exprs = (List *) adjust_appendrel_attrs(root,
+
(Node *) scanjoin_target->exprs,
+
nappinfos,
+
appinfos);
Why can't we use input_child_rel->pathtarget? It should be same as the
translation of its parent's path target. I probably understand that's
because
the input rel's path targets have been changed after the underlying join
was
planned, a step which is not applied to the individual children. May be
add a
comment here?

Done. Added comments.

+        child_target->exprs = (List *) adjust_appendrel_attrs(root,
+                                                              (Node
*) target->exprs,
+                                                              nappinfos,
+                                                              appinfos);
+        partial_target = make_partial_grouping_target(root, target);
+        partial_target->exprs = (List *) adjust_appendrel_attrs(root,
+                                                                (Node
*) partial_target->exprs,
+                                                                nappinfos,
+                                                                appinfos);

We need both of these steps for any aggregate since parallel paths will
compute
parial paths anyway. If that's correct, may be we should add a comment?

Done.

+ extra.inputRows = 0; /* Not needed at child paths creation */

Why? Comment should be on its own line.

It is actually not used in create_child_grouping_paths(). But setting that
value has no side effect, thus set that correctly and removed the comments.

+        if (!create_child_grouping_paths(root, input_child_rel,
agg_costs, gd,
+                                         &extra))
+        {
+            /* Could not create path for childrel, return */
+            pfree(appinfos);
+            return;
+        }
Can we detect this condition and bail out even before planning any of the
children? It looks wasteful to try to plan children only to bail out in
this
case.

I don't think so. It is like non-reachable and added just for a safety in
case we can't able to create a child path. The bail out conditions cannot
be evaluated at the beginning. Do you this an Assert() will be good here?
Am I missing something?

+    /* Nothing to do if we have no live children */
+    if (live_children == NIL)
+        return;
A parent relation with all dummy children will also be dummy. May be we
should
mark the parent dummy case using mark_dummy_rel() similar to
generate_partition_wise_join_paths().

If parent is dummy, then we are not at all doing PWA. So no need to mark
parent grouped_rel as dummy I guess.
However, if some of the children are dummy, I am marking corresponding
upper rel as dummy too.
Actually, this condition will never going to be true as you said correctly
that "A parent relation with all dummy children will also be dummy". Should
we have an Assert() instead?

+/*
+ * have_grouping_by_partkey
+ *
Somehow this name sounds like it would return true when GROUP BY contains
only
partition key. May be rename as group_by_has_partkey? to indicate the

OK. Renamed.

+ * Returns true, if partition keys of the given relation are part of the
+ * GROUP BY clauses, false otherwise.
Reword as " ... if all the partition keys of ... "

Done.

+static bool
+have_grouping_by_partkey(RelOptInfo *input_rel, PathTarget *target,
+                         List *groupClause)
+{
+    List       *tlist = make_tlist_from_pathtarget(target);
+    List       *groupexprs = get_sortgrouplist_exprs(groupClause, tlist);
Have we tested the case with multi-level partitioned table and children
with
different order of partition key columns?

I have testcase for multi-level partitioned table.
However, I did not understand by what you mean by "children with different
order of partition key columns". I had a look over tests in
partition_join.sql and it seems that I have cover all those scenarios.
Please have a look over testcases added for PWA and let me know the
scenarios missing, I will add them then.

+        partexprs = input_rel->partexprs ? input_rel->partexprs[cnt] :
NIL;
+
+        /* Rule out early, if there are no partition keys present */
+        if (partexprs == NIL)
+            return false;

If input_rel->partexprs is NIL, we should "bail" out even before the loop
starts.

Done.

+        foreach(lc, partexprs)
+        {
+            Expr       *partexpr = lfirst(lc);
+
+            if (list_member(groupexprs, partexpr))
+            {
+                found = true;
+                break;
+            }
+        }
This looks like a useful piece of general functionality
list_has_intersection(), which would returns boolean instead of the whole
intersection. I am not sure whether we should add that function to list.c
and
use here.

Sounds good.
But for now, I am keeping it as part of this feature itself.

+ * If none of the partition key matches with any of the GROUP BY

Reword as "... the partition key expressions match with ...."

Done.

This isn't a full review of 0007, but I think it covers most of the new
functionality.

[1] /messages/by-id/CAFjFpRdUz6h6cmFZFYAngmQAX8Zvo
+MZsPXidZ077h=gp9bvQw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Thanks for the details review Ashutosh.

Let me know if I missed any comment to be fixed.

Thanks

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#48

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Jeevan Chalke (#47)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Agree. However, there was ab existing comment in create_append_path() saying
"We don't bother with inventing a cost_append(), but just do it here", which
implies at sometime in future we may need it; so why not now where we are
explicitly costing for an append node. Having a function is good so that, if
required in future, we need update in only this function.
Let me know if you think otherwise, I make those changes in next patchset.

I don't read that comment as something we will do it in future. I
don't think the amount of changes that this patch introduces just for
adding one more line of code aren't justified. There's anyway only one
place where we are costing append, so it's not that the new function
avoids code duplication. Although I am happy to defer this to the
committer, if you think that we need a separate function.

As suggested by Robert, I have renamed it to APPEND_CPU_COST_MULTIPLIER in
v7 patchset.
Also, retained the #define for just multiplier as suggested by Robert.

Ok.

Anyway, it doesn't look like a good idea to pass an argument (gd) only to
return from that function in case of its presence. May be we should handle
it
outside this function.

Well, I would like to have it inside the function itself. Let the function
itself do all the necessary checking rather than doing some of them outside.

We will leave this to the committer. I don't like that style, but it's
also good to expect a function to do all related work.

+        if (!create_child_grouping_paths(root, input_child_rel,
agg_costs, gd,
+                                         &extra))
+        {
+            /* Could not create path for childrel, return */
+            pfree(appinfos);
+            return;
+        }
Can we detect this condition and bail out even before planning any of the
children? It looks wasteful to try to plan children only to bail out in
this
case.
I don't think so. It is like non-reachable and added just for a safety in
case we can't able to create a child path. The bail out conditions cannot be
evaluated at the beginning. Do you this an Assert() will be good here? Am I
missing something?

An Assert would help. If it's something that should not happen, we
should try catching that rather that silently ignoring it.

+    /* Nothing to do if we have no live children */
+    if (live_children == NIL)
+        return;
A parent relation with all dummy children will also be dummy. May be we
should
mark the parent dummy case using mark_dummy_rel() similar to
generate_partition_wise_join_paths().
If parent is dummy, then we are not at all doing PWA. So no need to mark
parent grouped_rel as dummy I guess.
However, if some of the children are dummy, I am marking corresponding upper
rel as dummy too.
Actually, this condition will never going to be true as you said correctly
that "A parent relation with all dummy children will also be dummy". Should
we have an Assert() instead?

Yes.

I have testcase for multi-level partitioned table.
However, I did not understand by what you mean by "children with different
order of partition key columns". I had a look over tests in
partition_join.sql and it seems that I have cover all those scenarios.
Please have a look over testcases added for PWA and let me know the
scenarios missing, I will add them then.

By children with different order of partition key columns, I meant
something like this

parent(a int, b int, c int) partition by (a), child1(b int, c int, a
int) partition by b, child1_1 (c int, a int, b int);

where the attribute numbers of the partition keys in different
children are different.

This looks like a useful piece of general functionality
list_has_intersection(), which would returns boolean instead of the whole
intersection. I am not sure whether we should add that function to list.c
and
use here.

Sounds good.
But for now, I am keeping it as part of this feature itself.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#49

Rajkumar Raghuwanshi

rajkumar.raghuwanshi@enterprisedb.com

about 8 years ago

In reply to: Jeevan Chalke (#47)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Let me know if I missed any comment to be fixed.

Hi,

I have applied v8 patches on commit id 8735978e7aebfbc499843630131c18d1f7346c79,
and getting below observation, please take a look.

Observation:
"when joining a foreign partition table with local partition table
getting wrong output
with partition_wise_join enabled, same is working fine on PG-head
without aggregates patch."

Test-case:
CREATE EXTENSION postgres_fdw;
CREATE SERVER pwj_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'postgres',port '5432',use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER pwj_server;

CREATE TABLE fplt1 (a int, c text) PARTITION BY LIST(c);
CREATE TABLE fplt1_p1 (a int, c text);
CREATE TABLE fplt1_p2 (a int, c text);
CREATE FOREIGN TABLE ftplt1_p1 PARTITION OF fplt1 FOR VALUES IN
('0000', '0001', '0002', '0003') SERVER pwj_server OPTIONS (TABLE_NAME
'fplt1_p1');
CREATE FOREIGN TABLE ftplt1_p2 PARTITION OF fplt1 FOR VALUES IN
('0004', '0005', '0006', '0007') SERVER pwj_server OPTIONS (TABLE_NAME
'fplt1_p2');
INSERT INTO fplt1_p1 SELECT i, to_char(i%8, 'FM0000') FROM
generate_series(0, 199, 2) i;
INSERT INTO fplt1_p2 SELECT i, to_char(i%8, 'FM0000') FROM
generate_series(200, 398, 2) i;

CREATE TABLE lplt2 (a int, c text) PARTITION BY LIST(c);
CREATE TABLE lplt2_p1 PARTITION OF lplt2 FOR VALUES IN ('0000',
'0001', '0002', '0003');
CREATE TABLE lplt2_p2 PARTITION OF lplt2 FOR VALUES IN ('0004',
'0005', '0006', '0007');
INSERT INTO lplt2 SELECT i, to_char(i%8, 'FM0000') FROM
generate_series(0, 398, 3) i;

SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
and t1.a = t2.a) WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
t2.c;
c | c | count
------+------+-------
0000 | 0000 | 1
0004 | 0004 | 1
0006 | 0006 | 1
(3 rows)

SET enable_partition_wise_join = on;
SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
and t1.a = t2.a) WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
t2.c;
c | c | count
------+------+-------
0000 | 0000 | 1
0004 | 0004 | 1
(2 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

#50

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Rajkumar Raghuwanshi (#49)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Nov 28, 2017 at 12:37 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwanshi@enterprisedb.com> wrote:

On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Let me know if I missed any comment to be fixed.

Hi,

I have applied v8 patches on commit id 8735978e7aebfbc499843630131c18
d1f7346c79,
and getting below observation, please take a look.

Observation:
"when joining a foreign partition table with local partition table
getting wrong output
with partition_wise_join enabled, same is working fine on PG-head
without aggregates patch."

I have observed the same behavior on the master branch too when
partition-wise join path is selected irrespective of this patch-set.

This is happening because data on the foreign table is not compliance with
the partitioning constraints.

Test-case:
CREATE EXTENSION postgres_fdw;
CREATE SERVER pwj_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS
(dbname 'postgres',port '5432',use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER pwj_server;

CREATE TABLE fplt1 (a int, c text) PARTITION BY LIST(c);
CREATE TABLE fplt1_p1 (a int, c text);
CREATE TABLE fplt1_p2 (a int, c text);
CREATE FOREIGN TABLE ftplt1_p1 PARTITION OF fplt1 FOR VALUES IN
('0000', '0001', '0002', '0003') SERVER pwj_server OPTIONS (TABLE_NAME
'fplt1_p1');
CREATE FOREIGN TABLE ftplt1_p2 PARTITION OF fplt1 FOR VALUES IN
('0004', '0005', '0006', '0007') SERVER pwj_server OPTIONS (TABLE_NAME
'fplt1_p2');
INSERT INTO fplt1_p1 SELECT i, to_char(i%8, 'FM0000') FROM
generate_series(0, 199, 2) i;
INSERT INTO fplt1_p2 SELECT i, to_char(i%8, 'FM0000') FROM
generate_series(200, 398, 2) i;

CREATE TABLE lplt2 (a int, c text) PARTITION BY LIST(c);
CREATE TABLE lplt2_p1 PARTITION OF lplt2 FOR VALUES IN ('0000',
'0001', '0002', '0003');
CREATE TABLE lplt2_p2 PARTITION OF lplt2 FOR VALUES IN ('0004',
'0005', '0006', '0007');
INSERT INTO lplt2 SELECT i, to_char(i%8, 'FM0000') FROM
generate_series(0, 398, 3) i;

SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
and t1.a = t2.a) WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
t2.c;
c | c | count
------+------+-------
0000 | 0000 | 1
0004 | 0004 | 1
0006 | 0006 | 1
(3 rows)

SET enable_partition_wise_join = on;
SELECT t1.c, t2.c,count(*) FROM fplt1 t1 JOIN lplt2 t2 ON (t1.c = t2.c
and t1.a = t2.a) WHERE t1.a % 25 = 0 GROUP BY 1,2 ORDER BY t1.c,
t2.c;
c | c | count
------+------+-------
0000 | 0000 | 1
0004 | 0004 | 1
(2 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#51

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Jeevan Chalke (#50)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Nov 28, 2017 at 5:50 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

[snip]

This is still a hot topic so I am moving it to next CF.
--
Michael

#52

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Jeevan Chalke (#47)

Re: [HACKERS] Partition-wise aggregation/grouping

Continuing with review of 0007.

+
+    /* Copy input rels's relids to grouped rel */
+    grouped_rel->relids = input_rel->relids;

Isn't this done in fetch_upper_rel()? Why do we need it here?
There's also a similar hunk in create_grouping_paths() which doesn't look
appropriate. I guess, you need relids in grouped_rel->relids for FDW. There are
two ways to do this: 1. set grouped_rel->relids for parent upper rel as well,
but then we should pass relids to fetch_upper_rel() instead of setting those
later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids to
all_baserels, if upper_rel->relids is NULL and don't set relids for a parent
upper rel. I am fine with either of those.

+            /* partial phase */
+            get_agg_clause_costs(root, (Node *) partial_target->exprs,
+                                 AGGSPLIT_INITIAL_SERIAL,
+                                 &agg_partial_costs);

IIUC, the costs for evaluating aggregates would not change per child. They
won't be different for parent and any of the children. So, we should be able to
calculate those once, save in "extra" and use repeatedly.

+        if (can_sort)
+        {
+            Path       *path = cheapest_path;
+
+            if (!(pathkeys_contained_in(root->group_pathkeys,
+                                        path->pathkeys)))
[ .. clipped patch .. ]
+                                           NIL,
+                                           dNumGroups));
+        }

We create two kinds of paths partial paths for parallel query and partial
aggregation paths when group keys do not have partition keys. The comments and
code uses partial to mean both the things, which is rather confusing. May be we
should use term "partial aggregation" explicitly wherever it means that in
comments and in variable names.

I still feel that create_grouping_paths() and create_child_grouping_paths()
have a lot of common code. While I can see that there are some pockets in
create_grouping_paths() which are not required in create_child_grouping_paths()
and vice-versa, may be we should create only one function and move those
pockets under "if (child)" or "if (parent)" kind of conditions. It will be a
maintenance burden to keep these two functions in sync in future. If we do not
keep them in sync, that will introduce bugs.

+
+/*
+ * create_partition_agg_paths
+ *
+ * Creates append path for all the partitions and adds into the grouped rel.

I think you want to write "Creates an append path containing paths from all the
child grouped rels and adds into the given parent grouped rel".

+ * For partial mode we need to add a finalize agg node over append path before
+ * adding a path to grouped rel.
+ */
+void
+create_partition_agg_paths(PlannerInfo *root,
+                           RelOptInfo *grouped_rel,
+                           List *subpaths,
+                           List *partial_subpaths,

Why do we have these two as separate arguments? I don't see any call to
create_partition_agg_paths() through add_append_path() passing both of them
non-NULL simultaneously. May be you want use a single list subpaths and another
boolean indicating whether it's list of partial paths or regular paths.

+
+    /* For non-partial path, just create a append path and we are done. */
This is the kind of confusion, I am talking about above. Here you have
mentioned "non-partial path" which may mean a regular path but what you
actually mean by that term is a path representing partial aggregates.

+    /*
+     * Partial partition-wise grouping paths.  Need to create final
+     * aggregation path over append relation.
+     *
+     * If there are partial subpaths, then we need to add gather path before we
+     * append these subpaths.

More confusion here.

+     */
+    if (partial_subpaths)
+    {
+        ListCell   *lc;
+
+        Assert(subpaths == NIL);
+
+        foreach(lc, partial_subpaths)
+        {
+            Path       *path = lfirst(lc);
+            double        total_groups = path->rows * path->parallel_workers;
+
+            /* Create gather paths for partial subpaths */
+            Path *gpath = (Path *) create_gather_path(root, grouped_rel, path,
+                                                      path->pathtarget, NULL,
+                                                      &total_groups);
+
+            subpaths = lappend(subpaths, gpath);

Using the argument variable is confusing and that's why you need two different
List variables. Instead probably you could have another variable local to this
function to hold the gather subpaths.

AFAIU, the Gather paths that this code creates has its parent set to
parent grouped
rel. That's not correct. These partial paths come from children of grouped rel
and each gather is producing rows corresponding to one children of grouped rel.
So gather path's parent should be set to corresponding child and not parent
grouped rel.

This code creates plans where there are multiple Gather nodes under an Append
node. AFAIU, the workers assigned to one gather node can be reused until that
Gather node finishes. Having multiple Gather nodes under an Append mean that
every worker will be idle from the time that worker finishes the work till the
last worker finishes the work. That doesn't seem to be optimal use of workers.
The plan that we create with Gather on top of Append seems to be better. So, we
should avoid creating one Gather node per child plans. Have we tried to compare
performance of these two plans?

+        if (!IsA(apath, MergeAppendPath) && root->group_pathkeys)
+        {
+            spath = (Path *) create_sort_path(root,
+                                              grouped_rel,
+                                              apath,
+                                              root->group_pathkeys,
+                                              -1.0);
+        }

The code here assumes that a MergeAppend path will always have pathkeys
matching group_pathkeys. I believe that's true but probably we should have an
Assert to make it clear and add comments. If that's not true, we will need to
sort the output of MergeAppend OR discard MergeAppend paths which do not have
pathkeys matching group_pathkeys.

diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b422050..1941468 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ UnlistenStmt
 UnresolvedTup
 UnresolvedTupData
 UpdateStmt
+UpperPathExtraData
 UpperRelationKind
 UpperUniquePath
 UserAuth

Do we commit this file as part of the feature?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#53

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Ashutosh Bapat (#52)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

This code creates plans where there are multiple Gather nodes under an Append
node.

We should avoid that. Starting and stopping workers is inefficient,
and precludes things like turning the Append into a Parallel Append.

AFAIU, the workers assigned to one gather node can be reused until that
Gather node finishes. Having multiple Gather nodes under an Append mean that
every worker will be idle from the time that worker finishes the work till the
last worker finishes the work.

No, workers will exit as soon as they finish. They don't hang around idle.

index b422050..1941468 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ UnlistenStmt
UnresolvedTup
UnresolvedTupData
UpdateStmt
+UpperPathExtraData
UpperRelationKind
UpperUniquePath
UserAuth

Do we commit this file as part of the feature?

Andres and I regularly commit such changes; Tom rejects them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#54

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Robert Haas (#53)

Re: [HACKERS] Partition-wise aggregation/grouping

On Sat, Dec 2, 2017 at 4:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

This code creates plans where there are multiple Gather nodes under an Append
node.

We should avoid that. Starting and stopping workers is inefficient,
and precludes things like turning the Append into a Parallel Append.

Ah, I didn't think about it. Thanks for bringing it up.

AFAIU, the workers assigned to one gather node can be reused until that
Gather node finishes. Having multiple Gather nodes under an Append mean that
every worker will be idle from the time that worker finishes the work till the
last worker finishes the work.

No, workers will exit as soon as they finish. They don't hang around idle.

Sorry, I think I used wrong word "idle". I meant that if a worker
finishes and exists, the query can't use it that worker slot until the
next Gather node starts. But as you pointed out, starting and stopping
a worker is costlier than the cost of not using the slot. So we should
avoid such plans.

index b422050..1941468 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ UnlistenStmt
UnresolvedTup
UnresolvedTupData
UpdateStmt
+UpperPathExtraData
UpperRelationKind
UpperUniquePath
UserAuth
Do we commit this file as part of the feature?
Andres and I regularly commit such changes; Tom rejects them.

We will leave it to the committer to decide what to do with this hunk.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#55

legrand legrand

legrand_legrand@hotmail.com

about 8 years ago

In reply to: Jeevan Chalke (#50)

Re: Partition-wise aggregation/grouping

Hello,

I'm testing Partition wise and found a strange result in V8 patch
see init_star_schema_agg.sql
<http://www.postgresql-archive.org/file/t348768/init_star_schema_agg.sql>

Explain gives
-> Seq Scan on facts_p2 (...) (actual time=0.012..0.012 rows=1 loops=1)
for partitions that are joined with empty partitions

I was expecting a message saying that partition facts_p2 was not accessed

Am I wrong ?
Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#56

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: legrand legrand (#55)

Re: Partition-wise aggregation/grouping

On Fri, Dec 8, 2017 at 3:08 AM, legrand legrand <legrand_legrand@hotmail.com

wrote:

Hello,

I'm testing Partition wise and found a strange result in V8 patch
see init_star_schema_agg.sql
<http://www.postgresql-archive.org/file/t348768/init_star_schema_agg.sql>

Explain gives
-> Seq Scan on facts_p2 (...) (actual time=0.012..0.012 rows=1 loops=1)
for partitions that are joined with empty partitions

I was expecting a message saying that partition facts_p2 was not accessed

Am I wrong ?

Is this related to partition-wise aggregation as you saying you found this
behaviour on v8 patch?

I have tried your testcase on master and see similar plan.

I had a look over the plan and I did not see any empty relation at the
planning time.

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-
f1928748.html

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

#57

legrand legrand

legrand_legrand@hotmail.com

about 8 years ago

In reply to: Jeevan Chalke (#56)

Re: Partition-wise aggregation/grouping

Thank you for the answer
This is a miss understanding of hash join behaviour on my side.

That means that there is at less on line read in facts_p2 part even if the
second table partition of the hash join operation is empty.

I will remember it now ;o)

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#58

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#54)

Re: [HACKERS] Partition-wise aggregation/grouping

Here are review comments for 0009
Only full aggregation is pushed on the remote server.

I think we can live with that for a while but we need to be able to push down
partial aggregates to the foreign server. I agree that it needs some
infrastructure to serialized and deserialize the partial aggregate values,
support unpartitioned aggregation first and then work on partitioned
aggregates. That is definitely a separate piece of work.

+-- ===================================================================
+-- test partition-wise-aggregates
+-- ===================================================================
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (a int, b int, c text);
+CREATE TABLE pagg_tab_p2 (a int, b int, c text);
+CREATE TABLE pagg_tab_p3 (a int, b int, c text);

Like partition-wise join testcases please use LIKE so that it's easy to change
the table schema if required.

+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30,
'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30,
'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i %
30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30,
'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i %
30) >= 20;

We have to do this because INSERT tuple routing to a foreign partition is not
supported right now. Somebody has to remember to change this to a single
statement once that's done.

+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;

I thought this is not needed. When you ANALYZE the partitioned table, it would
analyze the partitions as well. But I see that partition-wise join is also
ANALYZING the foreign partitions separately. When I ran ANALYZE on a
partitioned table with foreign partitions, statistics for only the local tables
(partitioned and partitions) was updated. Of course this is separate work, but
probably needs to be fixed.

+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan when partition-wise-agg is disabled

s/when/with/

+-- Plan when partition-wise-agg is enabled

s/when/with/

+ -> Append

Just like ForeignScan node's Relations tell what kind of ForeignScan this is,
may be we should annotate Append to tell whether the children are joins,
aggregates or relation scans. That might be helpful. Of course as another
patch.

+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING
avg(b) < 25 ORDER BY 1;
+ a  | sum  | min | count
+----+------+-----+-------
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
[ ... clipped ...]
+ 23 | 2300 |   3 |   100
+ 24 | 2400 |   4 |   100
+(15 rows)

May be we want to reduce the number of rows to a few by using a stricter HAVING
clause?

+
+-- When GROUP BY clause not matches with PARTITION KEY.

... clause does not match ...

+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+
QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------------------------
+ Sort
+   Output: fpagg_tab_p1.b, (avg(fpagg_tab_p1.a)),
(max(fpagg_tab_p1.a)), (count(*))
+               ->  Partial HashAggregate
[ ... clipped ... ]
+                     Output: fpagg_tab_p3.b, PARTIAL
avg(fpagg_tab_p3.a), PARTIAL max(fpagg_tab_p3.a), PARTIAL count(*),
PARTIAL sum(fpagg_tab_p3.a)
+                     Group Key: fpagg_tab_p3.b
+                     ->  Foreign Scan on public.fpagg_tab_p3
+                           Output: fpagg_tab_p3.b, fpagg_tab_p3.a
+                           Remote SQL: SELECT a, b FROM public.pagg_tab_p3
+(26 rows)

I think we interested in overall shape of the plan and not the details of
Remote SQL etc. So, may be turn off VERBOSE. This comment applies to an earlier
plan with enable_partition_wise_agg = false;

+
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+ b  |         avg         | max | count
+----+---------------------+-----+-------
+  0 | 10.0000000000000000 |  20 |    60
+  1 | 11.0000000000000000 |  21 |    60
[... clipped ...]
+ 42 | 12.0000000000000000 |  22 |    60
+ 43 | 13.0000000000000000 |  23 |    60
+(20 rows)

Since the aggregates were not pushed down, I doubt we should be testing the
output. But this test is good to check partial aggregates over foreign
partition scans, which we don't have in postgres_fdw.sql I think. So, may be
add it as a separate patch?

Can you please add a test where we reference a whole-row; that usually has
troubles.

-    if (root->hasHavingQual && query->havingQual)
+    if (root->hasHavingQual && fpinfo->havingQual)

This is not exactly a problem with your patch, but why do we need to check both
the boolean and the actual clauses? If the boolean is true, query->havingQual
should be non-NULL and NULL otherwise.

     /* Grouping information */
     List       *grouped_tlist;
+    PathTarget *grouped_target;
+    Node       *havingQual;

I think we don't need havingQual as a separate member. foreign_grouping_ok()
separates the clauses in havingQual into shippable and non-shippable clauses
and saves in local_conditions and remote_conditions. Probably we want to use
those instead of adding a new member.

index 04e43cc..c8999f6 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function)
(PlannerInfo *root,
 typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
                                                UpperRelationKind stage,
                                                RelOptInfo *input_rel,
-                                               RelOptInfo *output_rel);
+                                               RelOptInfo *output_rel,
+                                               UpperPathExtraData *extra);

Probably this comment belongs to 0007, but it's in this patch that it becomes
clear how invasive UpperPathExtraData changes are. While UpperPathExtraData has
upper paths in the name, all of its members are related to grouping. That's
fine since we only support partition-wise aggregate and not the other upper
operations. But if we were to do that in future, which of these members would
be applicable to other upper relations? inputRows, pathTarget,
partialPathTarget may be applicable to other upper rels as well. can_sort,
can_hash may be applicable to DISTINCT, SORT relations. isPartial and
havingQual will be applicable only to Grouping/Aggregation. So, may be it's ok,
and like RelOptInfo we may separate them by comments.

Another problem with that structure is its name doesn't mention that the
structure is used only for child upper relations, whereas the code assumes that
if extra is not present it's a parent upper relation. May be we want to rename
it to that effect or always use it whether for a parent or a child relation.

We may want to rename pathTarget and partialPathTarget as relTarget and
partialRelTarget since those targets are not specific to any path, but will be
applicable to all the paths created for that rel.

On Mon, Dec 4, 2017 at 7:44 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Sat, Dec 2, 2017 at 4:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

This code creates plans where there are multiple Gather nodes under an Append
node.

We should avoid that. Starting and stopping workers is inefficient,
and precludes things like turning the Append into a Parallel Append.

Ah, I didn't think about it. Thanks for bringing it up.

AFAIU, the workers assigned to one gather node can be reused until that
Gather node finishes. Having multiple Gather nodes under an Append mean that
every worker will be idle from the time that worker finishes the work till the
last worker finishes the work.

No, workers will exit as soon as they finish. They don't hang around idle.

Sorry, I think I used wrong word "idle". I meant that if a worker
finishes and exists, the query can't use it that worker slot until the
next Gather node starts. But as you pointed out, starting and stopping
a worker is costlier than the cost of not using the slot. So we should
avoid such plans.
index b422050..1941468 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ UnlistenStmt
UnresolvedTup
UnresolvedTupData
UpdateStmt
+UpperPathExtraData
UpperRelationKind
UpperUniquePath
UserAuth
Do we commit this file as part of the feature?
Andres and I regularly commit such changes; Tom rejects them.
We will leave it to the committer to decide what to do with this hunk.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#59

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#58)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Here are review comments for 0009

Thank you, Ashutosh for the detailed review so far.

I am working on your reviews but since parallel Append is now committed,
I need to re-base my changes over it and need to resolve the conflicts too.

Once done, I will submit the new patch-set fixing these and earlier review
comments.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#60

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Jeevan Chalke (#59)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Dec 13, 2017 at 6:37 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Here are review comments for 0009

Thank you, Ashutosh for the detailed review so far.

I am working on your reviews but since parallel Append is now committed,
I need to re-base my changes over it and need to resolve the conflicts too.

Once done, I will submit the new patch-set fixing these and earlier review
comments.

Sure no problem. Take your time. Here's set of comments for 0008. That
ends the first read of all the patches (2nd reading for the core
changes)

+-- Also, disable parallel paths.
+SET max_parallel_workers_per_gather TO 0;

If you enable parallel aggregation for smaller data partition-wise aggregation
paths won't be chosen. I think this is the reason why you are disabling
parallel query. But we should probably explain that in a comment. Better if we
could come up testcases without disabling parallel query. Since parallel append
is now committed, may be it can help.

+
+-- Check with multiple columns in GROUP BY, order in target-list is reversed
+EXPLAIN (COSTS OFF)
+SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c;
+                   QUERY PLAN
+-------------------------------------------------
+ Append
+   ->  HashAggregate
+         Group Key: pagg_tab_p1.a, pagg_tab_p1.c
+         ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(10 rows)

Why do we need this testcase?

+
+-- Test when input relation for grouping is dummy
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+           QUERY PLAN
+--------------------------------
+ HashAggregate
+   Group Key: pagg_tab.c
+   ->  Result
+         One-Time Filter: false
+(4 rows)

Not part of your patch, I am wondering if we can further optimize this plan by
converting HashAggregate to Result (One-time Filter: false) and the aggregate
target. Just an idea.

+
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+ c | sum
+---+-----
+(0 rows)

I think we also need a case when the child input relations are marked dummy and
then the parent is marked dummy. Just use a condition with partkey = <none of
list bounds>.

+
+-- Check with SORTED paths. Disable hashagg to get group aggregate

Suggest: "Test GroupAggregate paths by disabling hash aggregates."

+-- When GROUP BY clause matches with PARTITION KEY.

I don't think we need "with", and just extend the same sentence with "complete
aggregation is performed for each partition"

+-- Should choose full partition-wise aggregation path

suggest: "Should choose full partition-wise GroupAggregate plan", but I guess
with the above suggestion, this sentence is not needed.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path

Similar suggestions as above.

+-- No aggregates, but still able to perform partition-wise aggregates

That's a funny construction. May be "Test partition-wise grouping without any
aggregates".

We should try some output for this query.

+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
+                   QUERY PLAN
+-------------------------------------------------
+ Group
+   Group Key: pagg_tab_p1.a
+   ->  Merge Append
+         Sort Key: pagg_tab_p1.a
+         ->  Group
+               Group Key: pagg_tab_p1.a
+               ->  Sort
+                     Sort Key: pagg_tab_p1.a
+                     ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(19 rows)

It's strange that we do not annotate partial grouping as Partial. Does not look
like a bug in your patch. Do we get similar output with parallel grouping?

+
+-- ORDERED SET within aggregate
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
+                               QUERY PLAN
+------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a))
+   ->  GroupAggregate
+         Group Key: pagg_tab_p1.a
+         ->  Sort
+               Sort Key: pagg_tab_p1.a
+               ->  Append
+                     ->  Seq Scan on pagg_tab_p1
+                     ->  Seq Scan on pagg_tab_p2
+                     ->  Seq Scan on pagg_tab_p3
+(10 rows)

pagg_tab is partitioned by column c. So, not having it in GROUP BY
itself might produce this plan if Partial parallel aggregation is expensive.
When testing negative tests like this GROUP BY should always have the partition
key.

In case of full aggregation, since all the rows that belong to the same group
come from the same partition, having an ORDER BY doesn't make any difference.
We should support such a case.

+INSERT INTO pagg_tab1 SELECT i%30, i%20 FROM generate_series(0, 299, 2) i;
+INSERT INTO pagg_tab2 SELECT i%20, i%30 FROM generate_series(0, 299, 3) i;

spaces around % operator?

+-- When GROUP BY clause matches with PARTITION KEY.
+-- Should choose full partition-wise aggregation path.

Probably we should just club single table and join cases under one set of
comments rather than repeating those? Create the tables once at the beginning
of the test file and group together the queries under one comment head.

+-- Disable mergejoin to get hash aggregate.
+SET enable_mergejoin TO false;

Why? We have tested that once.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path.
+-- Also check with SORTED paths. Disable hashagg to get group aggregate.
+SET enable_hashagg TO false;

Same as above. Two of those clubbed together they will produce one hash and one
group plan. That will cover it.

+-- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
+-- aggregation
+-- LEFT JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column
+EXPLAIN (COSTS OFF)
+SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x =
b.y GROUP BY 1 ORDER BY 1 NULLS LAST;

May be you should explicitly use GROUP BY b.y in all of these queries.

+-- FULL JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column

In case of a FULL JOIN partition keys from the joining relations land on
nullable side; there is no key on non-nulllable side, so an aggregation on top
of FULL JOIN will always be partial partition-wise aggregation.

+
+-- Empty relations on LEFT side, no partition-wise agg plan.

Suggest: Empty join relation because of empty outer side. I don't think we are
writing a negative test to check whether partition-wise agg plan is not chosen.
We are testing the case when the join relation is empty.

+
+EXPLAIN (COSTS OFF)
+SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab GROUP BY c, a,
(a+b)/2 HAVING sum(b) = 50 AND avg(c) > 25 ORDER BY 1, 2, 3;

Keep this or the previous one, both is overkill. I will vote for this one, but
it's upto you.

May be add a testcase with the partition keys themselves switched; output just
the plan.

+-- Test with multi-level partitioning scheme
+-- Partition-wise aggregation is tried only on first level.
[ ... clipped ... ]
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY

This seems to contradict with the previous comment. May be club them together
and say "Partition-wise aggregation with full aggregation only at the first
leve" and move that whole comment down.

+
+-- Partial aggregation as GROUP BY clause does not match with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3;
+                           QUERY PLAN
+----------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*))
+   ->  Finalize GroupAggregate
+         Group Key: pagg_tab_p1.b
+         ->  Sort
+               Sort Key: pagg_tab_p1.b
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p1.b
+                           ->  Seq Scan on pagg_tab_p1
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p2_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p2_s1
+                                 ->  Seq Scan on pagg_tab_p2_s2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p3_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p3_s1
+                                 ->  Seq Scan on pagg_tab_p3_s2
+(20 rows)

Why aren't we seeing partial aggregation paths for level two and below
partitions?

+
+-- Test on middle level partitioned table which is further partitioned on b.
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p3_s1.b, (sum(pagg_tab_p3_s1.a)), (count(*))
+   ->  Append
+         ->  HashAggregate
+               Group Key: pagg_tab_p3_s1.b
+               ->  Seq Scan on pagg_tab_p3_s1
+         ->  HashAggregate
+               Group Key: pagg_tab_p3_s2.b
+               ->  Seq Scan on pagg_tab_p3_s2
+(9 rows)
+
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+ b | sum  | count
+---+------+-------
+ 0 | 2000 |   100
+ 1 | 2100 |   100
+ 2 | 2200 |   100
+ 3 | 2300 |   100
+ 4 | 2400 |   100
+ 5 | 2500 |   100
+ 6 | 2600 |   100
+ 7 | 2700 |   100
+ 8 | 2800 |   100
+ 9 | 2900 |   100
+(10 rows)

We should just remove this case, it's same as testing top-level partitioned
tables.

+
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab GROUP
BY a, b HAVING avg(b) < 3 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b)), (array_agg(DISTINCT
pagg_tab_p1.c))
+   ->  Append
+         ->  GroupAggregate
+               Group Key: pagg_tab_p1.a, pagg_tab_p1.b
+               Filter: (avg(pagg_tab_p1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
+                     ->  Seq Scan on pagg_tab_p1
+         ->  GroupAggregate
+               Group Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+               Filter: (avg(pagg_tab_p2_s1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+                     ->  Append
+                           ->  Seq Scan on pagg_tab_p2_s1
+                           ->  Seq Scan on pagg_tab_p2_s2
+         ->  GroupAggregate
+               Group Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+               Filter: (avg(pagg_tab_p3_s1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+                     ->  Append
+                           ->  Seq Scan on pagg_tab_p3_s1
+                           ->  Seq Scan on pagg_tab_p3_s2
+(25 rows)

Instead of an Append node appearing under GroupAggregate, I think we should
flatten all the partition scans for the subpartitions whose partition keys are
part of group keys and add GroupAggregate on top of each of such partition
scans.

+-- Parallelism within partition-wise aggregates
+RESET max_parallel_workers_per_gather;
+SET min_parallel_table_scan_size TO '8kB';
+SET parallel_setup_cost TO 0;
+INSERT INTO pagg_tab_para SELECT i%30, i%20 FROM generate_series(0, 29999) i;

spaces around % operator?

+SHOW max_parallel_workers_per_gather;
+ max_parallel_workers_per_gather
+---------------------------------
+ 2

Why do we need this?

+
+-- When GROUP BY clause matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x
HAVING avg(y) < 7 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_para_p1.x, (sum(pagg_tab_para_p1.y)),
(avg(pagg_tab_para_p1.y))
+   ->  Append
[ ... clipped ...]
+         ->  Finalize GroupAggregate
+               Group Key: pagg_tab_para_p3.x
+               Filter: (avg(pagg_tab_para_p3.y) < '7'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_para_p3.x
+                     ->  Gather
+                           Workers Planned: 2
+                           ->  Partial HashAggregate
+                                 Group Key: pagg_tab_para_p3.x
+                                 ->  Parallel Seq Scan on pagg_tab_para_p3
[ ... clipped ... ]
+-- When GROUP BY clause not matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT y, sum(x), avg(x), count(*) FROM pagg_tab_para GROUP BY y
HAVING avg(x) < 12 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_para_p1.y, (sum(pagg_tab_para_p1.x)),
(avg(pagg_tab_para_p1.x))
+   ->  Finalize HashAggregate
+         Group Key: pagg_tab_para_p1.y
[ ... clipped ... ]
+               ->  Gather
+                     Workers Planned: 2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_para_p3.y
+                           ->  Parallel Seq Scan on pagg_tab_para_p3

Per a prior discussion on this thread, we shouldn't produce such plans;
Parallel Append instead?

+SET enable_partition_wise_agg to true;

May be just enable it at the beginning instead of enabling and disabling twice?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#61

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#60)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
+                   QUERY PLAN
+-------------------------------------------------
+ Group
+   Group Key: pagg_tab_p1.a
+   ->  Merge Append
+         Sort Key: pagg_tab_p1.a
+         ->  Group
+               Group Key: pagg_tab_p1.a
+               ->  Sort
+                     Sort Key: pagg_tab_p1.a
+                     ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(19 rows)

It's strange that we do not annotate partial grouping as Partial. Does not look
like a bug in your patch. Do we get similar output with parallel grouping?

I am wrong here. It's not partial grouping. It's two level grouping. I
think annotating Group as Partial would be misleading. Sorry.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#62

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#60)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

+
+-- Test when input relation for grouping is dummy
+EXPLAIN (COSTS OFF)
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+           QUERY PLAN
+--------------------------------
+ HashAggregate
+   Group Key: pagg_tab.c
+   ->  Result
+         One-Time Filter: false
+(4 rows)
Not part of your patch, I am wondering if we can further optimize this plan by
converting HashAggregate to Result (One-time Filter: false) and the aggregate
target. Just an idea.

This comment is also wrong. The finalization step of aggregates needs
to be executed irrespective of whether or not the underlying scan
produces any rows. It may, for example, add a constant value to the
transition result. We may apply this optimization only when none of
aggregations have finalization functions, but it may not be worth the
code.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#63

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#52)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Attached patchset with all the review comments reported so far.

On Fri, Dec 1, 2017 at 6:11 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Continuing with review of 0007.
+
+    /* Copy input rels's relids to grouped rel */
+    grouped_rel->relids = input_rel->relids;
Isn't this done in fetch_upper_rel()? Why do we need it here?
There's also a similar hunk in create_grouping_paths() which doesn't look
appropriate. I guess, you need relids in grouped_rel->relids for FDW.
There are
two ways to do this: 1. set grouped_rel->relids for parent upper rel as
well,
but then we should pass relids to fetch_upper_rel() instead of setting
those
later. 2. For a parent upper rel, in create_foreignscan_plan(), set relids
to
all_baserels, if upper_rel->relids is NULL and don't set relids for a
parent
upper rel. I am fine with either of those.

Done. Opted second option.

+            /* partial phase */
+            get_agg_clause_costs(root, (Node *) partial_target->exprs,
+                                 AGGSPLIT_INITIAL_SERIAL,
+                                 &agg_partial_costs);
IIUC, the costs for evaluating aggregates would not change per child. They
won't be different for parent and any of the children. So, we should be
able to
calculate those once, save in "extra" and use repeatedly.

Yep. Done.

+        if (can_sort)
+        {
+            Path       *path = cheapest_path;
+
+            if (!(pathkeys_contained_in(root->group_pathkeys,
+                                        path->pathkeys)))
[ .. clipped patch .. ]
+                                           NIL,
+                                           dNumGroups));
+        }
We create two kinds of paths partial paths for parallel query and partial
aggregation paths when group keys do not have partition keys. The comments
and
code uses partial to mean both the things, which is rather confusing. May
be we
should use term "partial aggregation" explicitly wherever it means that in
comments and in variable names.

Agree. Used "partial aggregation" wherever applicable. Let me know if you
see any other place need this adjustments.

I still feel that create_grouping_paths() and create_child_grouping_paths()
have a lot of common code. While I can see that there are some pockets in
create_grouping_paths() which are not required in
create_child_grouping_paths()
and vice-versa, may be we should create only one function and move those
pockets under "if (child)" or "if (parent)" kind of conditions. It will be
a
maintenance burden to keep these two functions in sync in future. If we do
not
keep them in sync, that will introduce bugs.

Agree that keeping these two functions in sync in future will be a
maintenance burden, but I am not yet sure how to refactor them cleanly.
Will give one more try and update those changes in the next patchset.

+
+/*
+ * create_partition_agg_paths
+ *
+ * Creates append path for all the partitions and adds into the grouped
rel.
I think you want to write "Creates an append path containing paths from
all the
child grouped rels and adds into the given parent grouped rel".

Reworded as you said.

+ * For partial mode we need to add a finalize agg node over append path
before
+ * adding a path to grouped rel.
+ */
+void
+create_partition_agg_paths(PlannerInfo *root,
+                           RelOptInfo *grouped_rel,
+                           List *subpaths,
+                           List *partial_subpaths,
Why do we have these two as separate arguments? I don't see any call to
create_partition_agg_paths() through add_append_path() passing both of them
non-NULL simultaneously. May be you want use a single list subpaths and
another
boolean indicating whether it's list of partial paths or regular paths.

After redesigning in the area of putting gather over append, I don't need
to pass all Append subpaths to this function at-all. Append is done by
add_paths_to_append_rel() itself. This function now just adds fanalization
steps as needed.
So, we don't have two lists now. And to know about partial paths, passed a
boolean instead. Please have a look and let me know if I missed any.

+
+    /* For non-partial path, just create a append path and we are done. */
This is the kind of confusion, I am talking about above. Here you have
mentioned "non-partial path" which may mean a regular path but what you
actually mean by that term is a path representing partial aggregates.

+    /*
+     * Partial partition-wise grouping paths.  Need to create final
+     * aggregation path over append relation.
+     *
+     * If there are partial subpaths, then we need to add gather path
before we
+     * append these subpaths.

More confusion here.

Hopefully no more confusion in this new version.

+     */
+    if (partial_subpaths)
+    {
+        ListCell   *lc;
+
+        Assert(subpaths == NIL);
+
+        foreach(lc, partial_subpaths)
+        {
+            Path       *path = lfirst(lc);
+            double        total_groups = path->rows *
path->parallel_workers;
+
+            /* Create gather paths for partial subpaths */
+            Path *gpath = (Path *) create_gather_path(root, grouped_rel,
path,
+                                                      path->pathtarget,
NULL,
+                                                      &total_groups);
+
+            subpaths = lappend(subpaths, gpath);

Using the argument variable is confusing and that's why you need two
different
List variables. Instead probably you could have another variable local to
this
function to hold the gather subpaths.

Done.

AFAIU, the Gather paths that this code creates has its parent set to
parent grouped
rel. That's not correct. These partial paths come from children of grouped
rel
and each gather is producing rows corresponding to one children of grouped
rel.
So gather path's parent should be set to corresponding child and not parent
grouped rel.

Yep.

This code creates plans where there are multiple Gather nodes under an
Append
node. AFAIU, the workers assigned to one gather node can be reused until
that
Gather node finishes. Having multiple Gather nodes under an Append mean
that
every worker will be idle from the time that worker finishes the work till
the
last worker finishes the work. That doesn't seem to be optimal use of
workers.
The plan that we create with Gather on top of Append seems to be better.
So, we
should avoid creating one Gather node per child plans. Have we tried to
compare
performance of these two plans?

Agree. Having Gather on top of the Append is better. Done that way. It
resolves your previous comment too.

+        if (!IsA(apath, MergeAppendPath) && root->group_pathkeys)
+        {
+            spath = (Path *) create_sort_path(root,
+                                              grouped_rel,
+                                              apath,
+                                              root->group_pathkeys,
+                                              -1.0);
+        }
The code here assumes that a MergeAppend path will always have pathkeys
matching group_pathkeys. I believe that's true but probably we should have
an
Assert to make it clear and add comments. If that's not true, we will need
to
sort the output of MergeAppend OR discard MergeAppend paths which do not
have
pathkeys matching group_pathkeys.

Oops. Thanks for pointing out that. You are correct.
Added relevant check which checks for required pathkeys present or not.

diff --git a/src/tools/pgindent/typedefs.list
b/src/tools/pgindent/typedefs.list
index b422050..1941468 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2345,6 +2345,7 @@ UnlistenStmt
UnresolvedTup
UnresolvedTupData
UpdateStmt
+UpperPathExtraData
UpperRelationKind
UpperUniquePath
UserAuth

Do we commit this file as part of the feature?
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

This patchset contains fixes for other review comments too.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#64

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#58)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Here are review comments for 0009
Only full aggregation is pushed on the remote server.

I think we can live with that for a while but we need to be able to push
down
partial aggregates to the foreign server. I agree that it needs some
infrastructure to serialized and deserialize the partial aggregate values,
support unpartitioned aggregation first and then work on partitioned
aggregates. That is definitely a separate piece of work.

Yep.

+CREATE TABLE pagg_tab_p3 (a int, b int, c text);

Like partition-wise join testcases please use LIKE so that it's easy to
change
the table schema if required.

Done.

+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30,
'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30,
'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i %
30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30,
'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i %
30) >= 20;

We have to do this because INSERT tuple routing to a foreign partition is
not
supported right now.

Yes.

Somebody has to remember to change this to a single
statement once that's done.

I don't know how we can keep track of it.

+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
I thought this is not needed. When you ANALYZE the partitioned table, it
would
analyze the partitions as well. But I see that partition-wise join is also
ANALYZING the foreign partitions separately. When I ran ANALYZE on a
partitioned table with foreign partitions, statistics for only the local
tables
(partitioned and partitions) was updated. Of course this is separate work,
but
probably needs to be fixed.

Hmm.

+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan when partition-wise-agg is disabled
s/when/with/

+-- Plan when partition-wise-agg is enabled

s/when/with/

Done.

+ -> Append

Just like ForeignScan node's Relations tell what kind of ForeignScan this
is,
may be we should annotate Append to tell whether the children are joins,
aggregates or relation scans. That might be helpful. Of course as another
patch.

OK.

+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING
avg(b) < 25 ORDER BY 1;
+ a  | sum  | min | count
+----+------+-----+-------
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
[ ... clipped ...]
+ 23 | 2300 |   3 |   100
+ 24 | 2400 |   4 |   100
+(15 rows)

May be we want to reduce the number of rows to a few by using a stricter
HAVING
clause?

Done.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
... clause does not match ...

Done.

+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+
QUERY PLAN
+-----------------------------------------------------------
------------------------------------------------------------
------------------------------
+ Sort
+   Output: fpagg_tab_p1.b, (avg(fpagg_tab_p1.a)),
(max(fpagg_tab_p1.a)), (count(*))
+               ->  Partial HashAggregate
[ ... clipped ... ]
+                     Output: fpagg_tab_p3.b, PARTIAL
avg(fpagg_tab_p3.a), PARTIAL max(fpagg_tab_p3.a), PARTIAL count(*),
PARTIAL sum(fpagg_tab_p3.a)
+                     Group Key: fpagg_tab_p3.b
+                     ->  Foreign Scan on public.fpagg_tab_p3
+                           Output: fpagg_tab_p3.b, fpagg_tab_p3.a
+                           Remote SQL: SELECT a, b FROM public.pagg_tab_p3
+(26 rows)

I think we interested in overall shape of the plan and not the details of
Remote SQL etc. So, may be turn off VERBOSE. This comment applies to an
earlier
plan with enable_partition_wise_agg = false;

OK. Removed VERBOSE from all the queries as we are interested in overall
shape of the plan.

+
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING
sum(a) < 800 ORDER BY 1;
+ b  |         avg         | max | count
+----+---------------------+-----+-------
+  0 | 10.0000000000000000 |  20 |    60
+  1 | 11.0000000000000000 |  21 |    60
[... clipped ...]
+ 42 | 12.0000000000000000 |  22 |    60
+ 43 | 13.0000000000000000 |  23 |    60
+(20 rows)
Since the aggregates were not pushed down, I doubt we should be testing the
output. But this test is good to check partial aggregates over foreign
partition scans, which we don't have in postgres_fdw.sql I think. So, may
be
add it as a separate patch?

Agree. Removed SELECT query. EXPLAIN is enough here.

Can you please add a test where we reference a whole-row; that usually has
troubles.

Added.

-    if (root->hasHavingQual && query->havingQual)
+    if (root->hasHavingQual && fpinfo->havingQual)
This is not exactly a problem with your patch, but why do we need to check
both
the boolean and the actual clauses? If the boolean is true,
query->havingQual
should be non-NULL and NULL otherwise.

Agree. But since this is not fault of this patch, I have kept as is.

/* Grouping information */
List       *grouped_tlist;
+    PathTarget *grouped_target;
+    Node       *havingQual;
I think we don't need havingQual as a separate member.
foreign_grouping_ok()
separates the clauses in havingQual into shippable and non-shippable
clauses
and saves in local_conditions and remote_conditions. Probably we want to
use
those instead of adding a new member.

local/remote_conditions is a list of RestrictInfos which cannot be used
while costing the aggregates. So we anyways need to store the havingQual.

index 04e43cc..c8999f6 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function)
(PlannerInfo *root,
typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
UpperRelationKind stage,
RelOptInfo *input_rel,
-                                               RelOptInfo *output_rel);
+                                               RelOptInfo *output_rel,
+                                               UpperPathExtraData *extra);
Probably this comment belongs to 0007, but it's in this patch that it
becomes
clear how invasive UpperPathExtraData changes are. While
UpperPathExtraData has
upper paths in the name, all of its members are related to grouping. That's
fine since we only support partition-wise aggregate and not the other upper
operations. But if we were to do that in future, which of these members
would
be applicable to other upper relations? inputRows, pathTarget,
partialPathTarget may be applicable to other upper rels as well. can_sort,
can_hash may be applicable to DISTINCT, SORT relations. isPartial and
havingQual will be applicable only to Grouping/Aggregation. So, may be
it's ok,
and like RelOptInfo we may separate them by comments.

I have grouped them like you said but not added any comments yet as I am
not sure at this point that which fields will be used by those other upper
rel kinds.
Please have a look.

Another problem with that structure is its name doesn't mention that the
structure is used only for child upper relations, whereas the code assumes
that
if extra is not present it's a parent upper relation. May be we want to
rename
it to that effect or always use it whether for a parent or a child
relation.

Renamed to OtherUpperPathExtraData.

We may want to rename pathTarget and partialPathTarget as relTarget and
partialRelTarget since those targets are not specific to any path, but
will be
applicable to all the paths created for that rel.

Renamed.

These fixes are part of the v9 patchset.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#65

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Ashutosh Bapat (#60)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Sure no problem. Take your time. Here's set of comments for 0008. That
ends the first read of all the patches (2nd reading for the core
changes)
+-- Also, disable parallel paths.
+SET max_parallel_workers_per_gather TO 0;
If you enable parallel aggregation for smaller data partition-wise
aggregation
paths won't be chosen. I think this is the reason why you are disabling
parallel query. But we should probably explain that in a comment. Better
if we
could come up testcases without disabling parallel query. Since parallel
append
is now committed, may be it can help.

Removed.

+
+-- Check with multiple columns in GROUP BY, order in target-list is
reversed
+EXPLAIN (COSTS OFF)
+SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c;
+                   QUERY PLAN
+-------------------------------------------------
+ Append
+   ->  HashAggregate
+         Group Key: pagg_tab_p1.a, pagg_tab_p1.c
+         ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(10 rows)

Why do we need this testcase?

Rajkumar, earlier reported one issue when order in the target list is
reversed. Fix then required redesigning the GROUP key matching algorithm.
So I think it will be good to have this testcase.

+
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+ c | sum
+---+-----
+(0 rows)
I think we also need a case when the child input relations are marked
dummy and
then the parent is marked dummy. Just use a condition with partkey = <none
of
list bounds>.

I have added the testcase for that. But don't you think both are same. When
all input children are dummy, parent too marked as dummy, i.e. input
relation is itself dummy.
Am I missing something here?

+
+-- Check with SORTED paths. Disable hashagg to get group aggregate
Suggest: "Test GroupAggregate paths by disabling hash aggregates."

+-- When GROUP BY clause matches with PARTITION KEY.

I don't think we need "with", and just extend the same sentence with
"complete
aggregation is performed for each partition"

+-- Should choose full partition-wise aggregation path

suggest: "Should choose full partition-wise GroupAggregate plan", but I
guess
with the above suggestion, this sentence is not needed.
+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path
Similar suggestions as above.

+-- No aggregates, but still able to perform partition-wise aggregates

That's a funny construction. May be "Test partition-wise grouping without
any
aggregates".

We should try some output for this query.
+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
+                   QUERY PLAN
+-------------------------------------------------
+ Group
+   Group Key: pagg_tab_p1.a
+   ->  Merge Append
+         Sort Key: pagg_tab_p1.a
+         ->  Group
+               Group Key: pagg_tab_p1.a
+               ->  Sort
+                     Sort Key: pagg_tab_p1.a
+                     ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(19 rows)
It's strange that we do not annotate partial grouping as Partial. Does not
look
like a bug in your patch. Do we get similar output with parallel grouping?

Its partial aggregation only which is finalize at the top.
But since tere are no aggregates involved we create a GROUP path and not an
AGG path. GROUP path has no partial annotations.
Yes, we see similar plan for parallel grouping too.

+-- ORDERED SET within aggregate
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
+                               QUERY PLAN
+------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a))
+   ->  GroupAggregate
+         Group Key: pagg_tab_p1.a
+         ->  Sort
+               Sort Key: pagg_tab_p1.a
+               ->  Append
+                     ->  Seq Scan on pagg_tab_p1
+                     ->  Seq Scan on pagg_tab_p2
+                     ->  Seq Scan on pagg_tab_p3
+(10 rows)

pagg_tab is partitioned by column c. So, not having it in GROUP BY
itself might produce this plan if Partial parallel aggregation is
expensive.
When testing negative tests like this GROUP BY should always have the
partition
key.

I deliberatly wanted to test when GROUP BY key does not match with the
partition key so that partial aggregation is forced. But then we do have
some limitiation to perform the aggregation in partial i.e. ORDERED SET
cannot be done in partial mode, this is the test to excerisize that code
path.

In case of full aggregation, since all the rows that belong to the same
group
come from the same partition, having an ORDER BY doesn't make any
difference.
We should support such a case.

We do support this.
Added testcase for it.

+INSERT INTO pagg_tab1 SELECT i%30, i%20 FROM generate_series(0, 299, 2) i;
+INSERT INTO pagg_tab2 SELECT i%20, i%30 FROM generate_series(0, 299, 3) i;
spaces around % operator?
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Should choose full partition-wise aggregation path.
Probably we should just club single table and join cases under one set of
comments rather than repeating those? Create the tables once at the
beginning
of the test file and group together the queries under one comment head.

I think other way round. It will be good to have corresponding
CREATE/INSERTs near the test queries to avoid lengthy scrolls to see the
table structure and data. Each query has a comment to describe what it does.

+-- Disable mergejoin to get hash aggregate.
+SET enable_mergejoin TO false;

Why? We have tested that once.

Removed.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path.
+-- Also check with SORTED paths. Disable hashagg to get group aggregate.
+SET enable_hashagg TO false;
Same as above. Two of those clubbed together they will produce one hash
and one
group plan. That will cover it.

For join queries plan with GroupAgg is not chosen which I wanted to have in
a test-coverage. Thus kept this as is.
We have tested GroupAgg for single partitioned relations though. Let me
know if you think this test is not necessary, I will remove it then.

+-- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
+-- aggregation
+-- LEFT JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column
+EXPLAIN (COSTS OFF)
+SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x =
b.y GROUP BY 1 ORDER BY 1 NULLS LAST;

May be you should explicitly use GROUP BY b.y in all of these queries.

I actually wanted to test GROUP BY n case too. But as you said, in these
queries I have used b.y and modified some other queries to have positional
notation in GROUP BY.

+-- FULL JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column
In case of a FULL JOIN partition keys from the joining relations land on
nullable side; there is no key on non-nulllable side, so an aggregation on
top
of FULL JOIN will always be partial partition-wise aggregation.

Yep.
Do you want me to add this explanation in the comment? I don't think so.

+-- Empty relations on LEFT side, no partition-wise agg plan.

Suggest: Empty join relation because of empty outer side. I don't think
we are
writing a negative test to check whether partition-wise agg plan is not
chosen.
We are testing the case when the join relation is empty.

I didn't get what exactly you mean here. However updated the comment as per
your suggestion.

+
+EXPLAIN (COSTS OFF)
+SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab GROUP BY c, a,
(a+b)/2 HAVING sum(b) = 50 AND avg(c) > 25 ORDER BY 1, 2, 3;
Keep this or the previous one, both is overkill. I will vote for this one,
but
it's upto you.

Removed previous one.

May be add a testcase with the partition keys themselves switched; output
just
the plan.

I don't think we need this, instead modified the earlier one. Please have a
look.

+-- Test with multi-level partitioning scheme
+-- Partition-wise aggregation is tried only on first level.
[ ... clipped ... ]
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
This seems to contradict with the previous comment. May be club them
together
and say "Partition-wise aggregation with full aggregation only at the first
leve" and move that whole comment down.

+
+-- Partial aggregation as GROUP BY clause does not match with PARTITION
KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3;
+                           QUERY PLAN
+----------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*))
+   ->  Finalize GroupAggregate
+         Group Key: pagg_tab_p1.b
+         ->  Sort
+               Sort Key: pagg_tab_p1.b
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p1.b
+                           ->  Seq Scan on pagg_tab_p1
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p2_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p2_s1
+                                 ->  Seq Scan on pagg_tab_p2_s2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p3_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p3_s1
+                                 ->  Seq Scan on pagg_tab_p3_s2
+(20 rows)

Why aren't we seeing partial aggregation paths for level two and below
partitions?

In this version of the patch I have not recursed into next level.
Will work on it and submit changes in the next patch-set.

+
+-- Test on middle level partitioned table which is further partitioned on
b.
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p3_s1.b, (sum(pagg_tab_p3_s1.a)), (count(*))
+   ->  Append
+         ->  HashAggregate
+               Group Key: pagg_tab_p3_s1.b
+               ->  Seq Scan on pagg_tab_p3_s1
+         ->  HashAggregate
+               Group Key: pagg_tab_p3_s2.b
+               ->  Seq Scan on pagg_tab_p3_s2
+(9 rows)
+
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+ b | sum  | count
+---+------+-------
+ 0 | 2000 |   100
+ 1 | 2100 |   100
+ 2 | 2200 |   100
+ 3 | 2300 |   100
+ 4 | 2400 |   100
+ 5 | 2500 |   100
+ 6 | 2600 |   100
+ 7 | 2700 |   100
+ 8 | 2800 |   100
+ 9 | 2900 |   100
+(10 rows)

We should just remove this case, it's same as testing top-level partitioned
tables.

Removed.

+
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab GROUP
BY a, b HAVING avg(b) < 3 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+-----------------------------------------------------------
---------------------------
+ Sort
+   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b)), (array_agg(DISTINCT
pagg_tab_p1.c))
+   ->  Append
+         ->  GroupAggregate
+               Group Key: pagg_tab_p1.a, pagg_tab_p1.b
+               Filter: (avg(pagg_tab_p1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
+                     ->  Seq Scan on pagg_tab_p1
+         ->  GroupAggregate
+               Group Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+               Filter: (avg(pagg_tab_p2_s1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+                     ->  Append
+                           ->  Seq Scan on pagg_tab_p2_s1
+                           ->  Seq Scan on pagg_tab_p2_s2
+         ->  GroupAggregate
+               Group Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+               Filter: (avg(pagg_tab_p3_s1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+                     ->  Append
+                           ->  Seq Scan on pagg_tab_p3_s1
+                           ->  Seq Scan on pagg_tab_p3_s2
+(25 rows)

Instead of an Append node appearing under GroupAggregate, I think we should
flatten all the partition scans for the subpartitions whose partition keys
are
part of group keys and add GroupAggregate on top of each of such partition
scans.

Yes. As explained earlier, will do that as a separate patch.

+-- Parallelism within partition-wise aggregates
+RESET max_parallel_workers_per_gather;
+SET min_parallel_table_scan_size TO '8kB';
+SET parallel_setup_cost TO 0;
+INSERT INTO pagg_tab_para SELECT i%30, i%20 FROM generate_series(0,
29999) i;

spaces around % operator?

+SHOW max_parallel_workers_per_gather;
+ max_parallel_workers_per_gather
+---------------------------------
+ 2

Why do we need this?

Removed.

+
+-- When GROUP BY clause matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x
HAVING avg(y) < 7 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+-----------------------------------------------------------
---------------------------
+ Sort
+   Sort Key: pagg_tab_para_p1.x, (sum(pagg_tab_para_p1.y)),
(avg(pagg_tab_para_p1.y))
+   ->  Append
[ ... clipped ...]
+         ->  Finalize GroupAggregate
+               Group Key: pagg_tab_para_p3.x
+               Filter: (avg(pagg_tab_para_p3.y) < '7'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_para_p3.x
+                     ->  Gather
+                           Workers Planned: 2
+                           ->  Partial HashAggregate
+                                 Group Key: pagg_tab_para_p3.x
+                                 ->  Parallel Seq Scan on pagg_tab_para_p3
[ ... clipped ... ]
+-- When GROUP BY clause not matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT y, sum(x), avg(x), count(*) FROM pagg_tab_para GROUP BY y
HAVING avg(x) < 12 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+-----------------------------------------------------------
---------------------------
+ Sort
+   Sort Key: pagg_tab_para_p1.y, (sum(pagg_tab_para_p1.x)),
(avg(pagg_tab_para_p1.x))
+   ->  Finalize HashAggregate
+         Group Key: pagg_tab_para_p1.y
[ ... clipped ... ]
+               ->  Gather
+                     Workers Planned: 2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_para_p3.y
+                           ->  Parallel Seq Scan on pagg_tab_para_p3

Per a prior discussion on this thread, we shouldn't produce such plans;
Parallel Append instead?

Yes. We do get a Parallel Append path now.
For full aggregation, normal Append plan in chosen over Append, but we do
create that.

+SET enable_partition_wise_agg to true;

May be just enable it at the beginning instead of enabling and disabling
twice?

Done as you said. However, this affected one more testcase from
partition_join.sql. Updated expected output for that too.

Comments for which I have not responded are all done.

All these fixes are part of v9 patchset.

Thanks Ashutosh for detailed reviews so far.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#66

Jeevan Chalke

jeevan.chalke@enterprisedb.com

about 8 years ago

In reply to: Jeevan Chalke (#65)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

+
+-- Partial aggregation as GROUP BY clause does not match with PARTITION
KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3;
+                           QUERY PLAN
+----------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*))
+   ->  Finalize GroupAggregate
+         Group Key: pagg_tab_p1.b
+         ->  Sort
+               Sort Key: pagg_tab_p1.b
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p1.b
+                           ->  Seq Scan on pagg_tab_p1
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p2_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p2_s1
+                                 ->  Seq Scan on pagg_tab_p2_s2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p3_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p3_s1
+                                 ->  Seq Scan on pagg_tab_p3_s2
+(20 rows)

Why aren't we seeing partial aggregation paths for level two and below
partitions?

In this version of the patch I have not recursed into next level.
Will work on it and submit changes in the next patch-set.

Attached new set of patches adding this. Only patch 0007 (main patch) and
0008 (testcase patch) has changed.

Please have a look and let me know if I missed any.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#67

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#66)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Jan 11, 2018 at 6:00 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new set of patches adding this. Only patch 0007 (main patch) and
0008 (testcase patch) has changed.

Please have a look and let me know if I missed any.

I spent a little time studying 0001 and 0002 today, as well as their
relation with 0007. I find the way that the refactoring has been done
slightly odd. With 0001 and 0002 applied, we end up with three
functions for creating aggregate paths: create_partial_agg_path, which
handles the partial-path case for both sort and hash;
create_sort_agg_path, which handles the sort case for non-partial
paths only; and create_hash_agg_path, which handles the hash case for
non-partial paths only. This leads to the following code in 0007:

+               /* Full grouping paths */
+
+               if (try_parallel_aggregation)
+               {
+                       Assert(extra->agg_partial_costs &&
extra->agg_final_costs);
+                       create_partial_agg_path(root, input_rel,
grouped_rel, target,
+
 partial_target, extra->agg_partial_costs,
+
 extra->agg_final_costs, gd, can_sort,
+
 can_hash, (List *) extra->havingQual);
+               }
+
+               if (can_sort)
+                       create_sort_agg_path(root, input_rel,
grouped_rel, target,
+
partial_target, agg_costs,
+
extra->agg_final_costs, gd, can_hash,
+
dNumGroups, (List *) extra->havingQual);
+
+               if (can_hash)
+                       create_hash_agg_path(root, input_rel,
grouped_rel, target,
+
partial_target, agg_costs,
+
extra->agg_final_costs, gd, dNumGroups,
+                                                                (List
*) extra->havingQual);

That looks strange -- you would expect to see either "sort" and "hash"
cases here, or maybe "partial" and "non-partial", or maybe all four
combinations, but seeing three things here looks surprising. I think
the solution is just to create a single function that does both the
work of create_sort_agg_path and the work of create_hash_agg_path
instead of having two separate functions.

A related thing that is also surprising is that 0007 manages to reuse
create_partial_agg_path for both the isPartialAgg and non-isPartialAgg
cases -- in fact, the calls appear to be identical, and could be
hoisted out of the "if" statement -- but create_sort_agg_path and
create_hash_agg_path do not get reused. I think you should see
whether you can define the new combo function that can be used for
both cases. The logic looks very similar, and I'm wondering why it
isn't more similar than it is; for instance, create_sort_agg_path
loops over the input rel's pathlist, but the code for
isPartialAgg/can_sort seems to consider only the cheapest path. If
this is correct, it needs a comment explaining it, but I don't see why
it should be correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#68

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#67)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Jan 16, 2018 at 3:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 11, 2018 at 6:00 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new set of patches adding this. Only patch 0007 (main patch)

and

0008 (testcase patch) has changed.

Please have a look and let me know if I missed any.

I spent a little time studying 0001 and 0002 today, as well as their
relation with 0007. I find the way that the refactoring has been done
slightly odd. With 0001 and 0002 applied, we end up with three
functions for creating aggregate paths: create_partial_agg_path, which
handles the partial-path case for both sort and hash;
create_sort_agg_path, which handles the sort case for non-partial
paths only; and create_hash_agg_path, which handles the hash case for
non-partial paths only. This leads to the following code in 0007:
+               /* Full grouping paths */
+
+               if (try_parallel_aggregation)
+               {
+                       Assert(extra->agg_partial_costs &&
extra->agg_final_costs);
+                       create_partial_agg_path(root, input_rel,
grouped_rel, target,
+
partial_target, extra->agg_partial_costs,
+
extra->agg_final_costs, gd, can_sort,
+
can_hash, (List *) extra->havingQual);
+               }
+
+               if (can_sort)
+                       create_sort_agg_path(root, input_rel,
grouped_rel, target,
+
partial_target, agg_costs,
+
extra->agg_final_costs, gd, can_hash,
+
dNumGroups, (List *) extra->havingQual);
+
+               if (can_hash)
+                       create_hash_agg_path(root, input_rel,
grouped_rel, target,
+
partial_target, agg_costs,
+
extra->agg_final_costs, gd, dNumGroups,
+                                                                (List
*) extra->havingQual);
That looks strange -- you would expect to see either "sort" and "hash"
cases here, or maybe "partial" and "non-partial", or maybe all four
combinations, but seeing three things here looks surprising. I think
the solution is just to create a single function that does both the
work of create_sort_agg_path and the work of create_hash_agg_path
instead of having two separate functions.

In existing code (in create_grouping_paths()), I see following pattern:
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

And thus, I have created three functions to match with existing pattern.

I will make your suggested changes that is merge create_sort_agg_path() and
create_hash_agg_path(). Will name that function as
create_sort_and_hash_agg_paths().

A related thing that is also surprising is that 0007 manages to reuse
create_partial_agg_path for both the isPartialAgg and non-isPartialAgg
cases -- in fact, the calls appear to be identical, and could be
hoisted out of the "if" statement

Yes. We can do that as well and I think it is better too.
I was just trying to preserve the existing pattern. So for PWA I chose:
if partialAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)
else fullAgg
if (try_parallel_aggregation)
if (can_sort)
if (can_hash)
if (can_sort)
if (can_hash)

But since, if (try_parallel_aggregation) case is exactly same, I will pull
that out of if..else.

-- but create_sort_agg_path and
create_hash_agg_path do not get reused. I think you should see
whether you can define the new combo function that can be used for
both cases. The logic looks very similar, and I'm wondering why it
isn't more similar than it is; for instance, create_sort_agg_path
loops over the input rel's pathlist, but the code for
isPartialAgg/can_sort seems to consider only the cheapest path. If
this is correct, it needs a comment explaining it, but I don't see why
it should be correct.

Oops. My mistake. Missed. We should loop over the input rel's pathlist.

Yep. With above change, the logic is very similar except
(1) isPartialAgg/can_sort case creates the partial paths and
(2) finalization step is not needed at this stage.

I think it can be done by passing a flag to create_sort_agg_path() (or new
combo function) and making appropriate adjustments. Do you think addition of
this new flag should go in re-factoring patch or main PWA patch?
I think re-factoring patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#69

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#68)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Jan 16, 2018 at 3:56 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I will make your suggested changes that is merge create_sort_agg_path() and
create_hash_agg_path(). Will name that function as
create_sort_and_hash_agg_paths().

I suggest add_paths_to_grouping_rel() and
add_partial_paths_to_grouping_rel(), similar to what commit
c44c47a773bd9073012935a29b0264d95920412c did with
add_paths_to_append_rel().

Oops. My mistake. Missed. We should loop over the input rel's pathlist.

Yep. With above change, the logic is very similar except
(1) isPartialAgg/can_sort case creates the partial paths and
(2) finalization step is not needed at this stage.

I'm not sure what you mean by #1.

I think it can be done by passing a flag to create_sort_agg_path() (or new
combo function) and making appropriate adjustments. Do you think addition of
this new flag should go in re-factoring patch or main PWA patch?
I think re-factoring patch.

I think the refactoring patch should move the existing code into a new
function without any changes, and then the main patch should add an
additional argument to that function that allows for either behavior.

By the way, I'm also a bit concerned about this:

+               /*
+                * For full aggregation, we are done with the partial
paths.  Just
+                * clear it out so that we don't try to create a
parallel plan over it.
+                */
+               grouped_rel->partial_pathlist = NIL;

I think that's being done for the same reason as mentioned at the
bottom of the current code for create_grouping_paths(). They are only
partially aggregated and wouldn't produce correct final results if
some other planning step -- create_ordered_paths, or the code that
sets up final_rel -- used them as if they had been fully agggregated.
I'm worried that there might be an analogous danger for partition-wise
aggregation -- that is, that the paths being inserted into the partial
pathlists of the aggregate child rels might get reused by some later
planning step which doesn't realize that the output they produce
doesn't quite match up with the rel to which they are attached. You
may have already taken care of that problem somehow, but we should
make sure that it's fully correct and clearly commented. I don't
immediately see why the isPartialAgg case should be any different from
the !isPartialAgg case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#70

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#69)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Jan 17, 2018 at 1:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 16, 2018 at 3:56 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I will make your suggested changes that is merge create_sort_agg_path()

and

create_hash_agg_path(). Will name that function as
create_sort_and_hash_agg_paths().

I suggest add_paths_to_grouping_rel() and
add_partial_paths_to_grouping_rel(), similar to what commit
c44c47a773bd9073012935a29b0264d95920412c did with
add_paths_to_append_rel().

Oops. My mistake. Missed. We should loop over the input rel's pathlist.

Yep. With above change, the logic is very similar except
(1) isPartialAgg/can_sort case creates the partial paths and
(2) finalization step is not needed at this stage.

I'm not sure what you mean by #1.

I mean, in case of isPartialAgg=true, we need to create a partial
aggregation path which has aggsplit=AGGSPLIT_INITIAL_SERIAL and should not
perform finalization at this stage. And thus add_paths_to_grouping_rel()
needs a flag to diffrentiate it. By adding that code chunk allows us to
reuse same function in both the cases i.e full and partial aggregation.

Attached patch with other review points fixed.

I think it can be done by passing a flag to create_sort_agg_path() (or

new

combo function) and making appropriate adjustments. Do you think

addition of

this new flag should go in re-factoring patch or main PWA patch?
I think re-factoring patch.

I think the refactoring patch should move the existing code into a new
function without any changes, and then the main patch should add an
additional argument to that function that allows for either behavior.

By the way, I'm also a bit concerned about this:
+               /*
+                * For full aggregation, we are done with the partial
paths.  Just
+                * clear it out so that we don't try to create a
parallel plan over it.
+                */
+               grouped_rel->partial_pathlist = NIL;
I think that's being done for the same reason as mentioned at the
bottom of the current code for create_grouping_paths(). They are only
partially aggregated and wouldn't produce correct final results if
some other planning step -- create_ordered_paths, or the code that
sets up final_rel -- used them as if they had been fully agggregated.
I'm worried that there might be an analogous danger for partition-wise
aggregation -- that is, that the paths being inserted into the partial
pathlists of the aggregate child rels might get reused by some later
planning step which doesn't realize that the output they produce
doesn't quite match up with the rel to which they are attached. You
may have already taken care of that problem somehow, but we should
make sure that it's fully correct and clearly commented. I don't
immediately see why the isPartialAgg case should be any different from
the !isPartialAgg case.

Actually I needed this because in case of full aggregation we already build
all final aggregation paths before performing Append operation. However,
when we do append in add_paths_to_append_rel(), it thinks that
partial_pathlist present in grouped_rel is finalized one exactly like you
mentioned above and thus we need to clear it out.

But yes, for safer side, I think once we done with partition-wise
aggregation step, we need to again go through the partitioning chain and
need to clear out all child grouped rel's partial_pathlist for the reason
mentioned at the bottom of the current code for create_grouping_paths().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#71

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#70)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Jan 18, 2018 at 8:55 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached patch with other review points fixed.

Committed 0001 and 0002 together, with some cosmetic changes,
including fixing pgindent damage. Please pgindent your patches before
submitting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#72

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#71)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Sat, Jan 27, 2018 at 1:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 18, 2018 at 8:55 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached patch with other review points fixed.

Committed 0001 and 0002 together, with some cosmetic changes,
including fixing pgindent damage.

Thanks Robert.

Please pgindent your patches before
submitting.

Sure, will take care of this.

Attached new patch set and rebased it on latest HEAD.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Thanks

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#73

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#72)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Jan 29, 2018 at 3:42 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new patch set and rebased it on latest HEAD.

I strongly dislike add_single_path_to_append_rel. It adds branches
and complexity to code that is already very complex. Most
importantly, why are we adding paths to fields in
OtherUpperPathExtraData *extra instead of adding them to the path list
of some RelOptInfo? If we had an appropriate RelOptInfo to which we
could add the paths, then we could make this simpler.

If I understand correctly, the reason you're doing it this way is
because we have no place to put partially-aggregated, non-partial
paths. If we only needed to worry about the parallel query case, we
could just build an append of partially-aggregated paths for each
child and stick it into the grouped rel's partial pathlist, just as we
already do for regular parallel aggregation. There's no reason why
add_paths_to_grouping_rel() needs to care about the difference a
Partial Aggregate on top of whatever and an Append each branch of
which is a Partial Aggregate on top of whatever. However, this won't
work for non-partial paths, because add_paths_to_grouping_rel() needs
to put paths into the grouped rel's pathlist -- and we can't mix
together partially-aggregated paths and fully-aggregated paths in the
same path list.

But, really, the way we're using grouped_rel->partial_pathlist right
now is an awful hack. What I'm thinking we could do is introduce a
new UpperRelationKind called UPPERREL_PARTIAL_GROUP_AGG, coming just
before UPPERREL_GROUP_AGG. Without partition-wise aggregate,
partially_grouped_rel's pathlist would always be NIL, and its partial
pathlist would be constructed using the logic in
add_partial_paths_to_grouping_rel, which would need renaming. Then,
add_paths_to_grouping_rel would use paths from input_rel when doing
non-parallel aggregation and paths from partially_grouped_rel when
doing parallel aggregation. This would eliminate the ugly
grouped_rel->partial_pathlist = NIL assignment at the bottom of
create_grouping_paths(), because the grouped_rel's partial_pathlist
would never have been (bogusly) populated in the first place, and
hence would not need to be reset. All of these changes could be made
via a preparatory patch.

Then the main patch needs to worry about four cases:

1. Parallel partition-wise aggregate, grouping key doesn't contain
partition key. This should just be a matter of adding additional
Append paths to partially_grouped_rel->partial_pathlist. The existing
code already knows how to stick a Gather and FinalizeAggregate step on
top of that, and I don't see why that logic would need any
modification or addition. An Append of child partial-grouping paths
should be producing the same output as a partial grouping of an
Append, except that the former case might produce more separate groups
that need to be merged; but that should be OK: we can just throw all
the paths into the same path list and let the cheapest one win.

2. Parallel partition-wise aggregate, grouping key contains partition
key. For the most part, this is no different from case #1. We won't
have groups spanning different partitions in this case, but we might
have groups spanning different workers, so we still need a
FinalizeAggregate step. As an exception, Gather -> Parallel Append ->
[non-partial Aggregate path] would give us a way of doing aggregation
in parallel without a separate Finalize step. I'm not sure if we want
to consider that to be in scope for this patch. If we do, then we'd
add the Parallel Append path to grouped_rel->partial_pathlist. Then,
we could stick Gather (Merge) on top if it to produce a path for
grouped_rel->pathlist using generate_gather_paths(); alternatively, it
can be used by upper planning steps -- something we currently can't
ever make work with parallel aggregation.

3. Non-parallel partition-wise aggregate, grouping key contains
partition key. Build Append paths from the children of grouped_rel
and add them to grouped_rel->pathlist.

3. Non-parallel partition-wise aggregate, grouping key doesn't contain
partition key. Build Append paths from the children of
partially_grouped_rel and add them to partially_grouped_rel->pathlist.
Also add code to generate paths for grouped_rel->pathlist by sticking
a FinalizeAggregate step on top of each path from
partially_grouped_rel->pathlist.

Overall, what I'm trying to argue for here is making this feature look
less like its own separate thing and more like part of the general
mechanism we've already got: partial paths would turn into regular
paths via generate_gather_paths(), and partially aggregated paths
would turn into fully aggregated paths by adding FinalizeAggregate.
The existing special case that allows us to build a non-partial, fully
aggregated path from a partial, partially-aggregated path would be
preserved.

I think this would probably eliminate some other problems I see in the
existing design as well. For example, create_partition_agg_paths
doesn't consider using Gather Merge, but that might be a win. With
the design above, I think you never need to call create_gather_path()
anywhere. In case #1, the existing code takes care of it. In the
special case mentioned under #2, if we chose to support that,
generate_gather_paths() would take care of it. Both of those places
already know about Gather Merge.

On another note, I found preferFullAgg to be wicked confusing. To
"prefer" something is to like it better, but be willing to accept
other options if the preference can't be accommodated. Here, it seems
like preferFullAgg = false prevents consideration of full aggregation.
So it's really more like allowFullAgg, or, maybe better,
try_full_aggregation. Also, try_partition_wise_grouping has a
variable isPartialAgg which is always ends up getting set to
!preferFullAgg. Having two Boolean variables which are always set to
the opposite of each other isn't good. To add to the confusion, the
code following the place where isPartialAgg is set sometimes refers to
isPartialAgg and sometimes refers to preferFullAgg.

I think the comments in this patch still need a good bit of work.
They tend to explain what the code does rather than the reason it does
it, and they tend to speak vaguely rather than precisely about things
happening in other places. For example, consider the need to set
partial_pathlist = NIL in create_grouping_paths(). Here's the existing
comment:

/*
* We've been using the partial pathlist for the grouped relation to hold
* partially aggregated paths, but that's actually a little bit bogus
* because it's unsafe for later planning stages -- like ordered_rel ---
* to get the idea that they can use these partial paths as if they didn't
* need a FinalizeAggregate step. Zap the partial pathlist at this stage
* so we don't get confused.
*/

Here's your comment about the same hazzard:

+        * For full aggregation, at this point we are already done with the
+        * finalization step and thus partial paths are no more needed. Keeping
+        * those will lead to some unwanted result later in the planning stage.
+        * Thus like create_grouping_paths(), clear them out.

Notice that your comment says that it will created an "unwanted
result" and that this result will happen "later", whereas the existing
comment is a lot more specific. It says exactly what the problem is
(FinalizeAggregate needed) and where the confusion will happen
(ordered_rel). Some other examples of comments with similar problems:

+        * If there are partial subpaths for parallelism, then we need to add
+        * gather path on top of the append. However, we only do this when full
+        * aggregation is required.  For partial aggregation this can be done at
+        * later stage.

Doesn't really explain why we're doing any of those things, just says
that we are. Also, what later stage?

+        * For non-partial aggregation path, just need to add given
append path to
+        * a grouped_rel.  Also, if caller requested a partial aggregation only,
+        * skip finalize step.

Again, why?

+        * Add all collected append paths into the grouped_rel.  For partial
+        * aggregation mode we need to add a finalize agg node over an append
+        * path.

Why?

+ /* Similarly, for partial paths. Here we need to add gather
node too. */

Why?

+                * However, in partial aggregation mode this is done
at later stage,
+                * so skip it.

When?

Here's an example of a much better comment you wrote:

+        * In find_ec_member_for_tle(), child EC members are ignored
if they don't
+        * belong to the given relids. Thus, if this sort path is
based on a child
+        * relation, we must pass the relids of it. Otherwise, we will
end-up into
+        * an error requiring pathkey item.

I haven't studied this patch in enough depth yet to figure out whether
I think that makes sense, but clearly when I go to do that this
comment is going to be a big help in figuring it out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#74

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#73)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Feb 1, 2018 at 1:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 29, 2018 at 3:42 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new patch set and rebased it on latest HEAD.

I strongly dislike add_single_path_to_append_rel. It adds branches
and complexity to code that is already very complex. Most
importantly, why are we adding paths to fields in
OtherUpperPathExtraData *extra instead of adding them to the path list
of some RelOptInfo? If we had an appropriate RelOptInfo to which we
could add the paths, then we could make this simpler.

If I understand correctly, the reason you're doing it this way is
because we have no place to put partially-aggregated, non-partial
paths. If we only needed to worry about the parallel query case, we
could just build an append of partially-aggregated paths for each
child and stick it into the grouped rel's partial pathlist, just as we
already do for regular parallel aggregation. There's no reason why
add_paths_to_grouping_rel() needs to care about the difference a
Partial Aggregate on top of whatever and an Append each branch of
which is a Partial Aggregate on top of whatever. However, this won't
work for non-partial paths, because add_paths_to_grouping_rel() needs
to put paths into the grouped rel's pathlist -- and we can't mix
together partially-aggregated paths and fully-aggregated paths in the
same path list.

Yes.

But, really, the way we're using grouped_rel->partial_pathlist right
now is an awful hack. What I'm thinking we could do is introduce a
new UpperRelationKind called UPPERREL_PARTIAL_GROUP_AGG, coming just
before UPPERREL_GROUP_AGG. Without partition-wise aggregate,
partially_grouped_rel's pathlist would always be NIL, and its partial
pathlist would be constructed using the logic in
add_partial_paths_to_grouping_rel, which would need renaming. Then,
add_paths_to_grouping_rel would use paths from input_rel when doing
non-parallel aggregation and paths from partially_grouped_rel when
doing parallel aggregation. This would eliminate the ugly
grouped_rel->partial_pathlist = NIL assignment at the bottom of
create_grouping_paths(), because the grouped_rel's partial_pathlist
would never have been (bogusly) populated in the first place, and
hence would not need to be reset. All of these changes could be made
via a preparatory patch.

I wrote a patch for this (on current HEAD) and attached separately here.
Please have a look.

I still not yet fully understand how we are going to pass those to the
add_paths_to_append_rel(). I need to look it more deeply though.

Then the main patch needs to worry about four cases:

1. Parallel partition-wise aggregate, grouping key doesn't contain
partition key. This should just be a matter of adding additional
Append paths to partially_grouped_rel->partial_pathlist. The existing
code already knows how to stick a Gather and FinalizeAggregate step on
top of that, and I don't see why that logic would need any
modification or addition. An Append of child partial-grouping paths
should be producing the same output as a partial grouping of an
Append, except that the former case might produce more separate groups
that need to be merged; but that should be OK: we can just throw all
the paths into the same path list and let the cheapest one win.

For any partial aggregation we need to add finalization step after we are
done with the APPEND i.e. post add_paths_to_append_rel(). Given that we
need to replicate the logic of sticking Gather and FinalizeAggregate step
at later stage. This is what exactly done in create_partition_agg_paths().
Am I missing something here?

2. Parallel partition-wise aggregate, grouping key contains partition
key. For the most part, this is no different from case #1. We won't
have groups spanning different partitions in this case, but we might
have groups spanning different workers, so we still need a
FinalizeAggregate step. As an exception, Gather -> Parallel Append ->
[non-partial Aggregate path] would give us a way of doing aggregation
in parallel without a separate Finalize step. I'm not sure if we want
to consider that to be in scope for this patch. If we do, then we'd
add the Parallel Append path to grouped_rel->partial_pathlist. Then,
we could stick Gather (Merge) on top if it to produce a path for
grouped_rel->pathlist using generate_gather_paths(); alternatively, it
can be used by upper planning steps -- something we currently can't
ever make work with parallel aggregation.

3. Non-parallel partition-wise aggregate, grouping key contains
partition key. Build Append paths from the children of grouped_rel
and add them to grouped_rel->pathlist.

Yes.

3. Non-parallel partition-wise aggregate, grouping key doesn't contain
partition key. Build Append paths from the children of
partially_grouped_rel and add them to partially_grouped_rel->pathlist.
Also add code to generate paths for grouped_rel->pathlist by sticking
a FinalizeAggregate step on top of each path from
partially_grouped_rel->pathlist.

Yes, this is done in create_partition_agg_paths().
create_partition_agg_paths() basically adds gather path, if required and
then finalizes it again if required. These steps are similar to that of
add_paths_to_grouping_rel() counterpart which does gather + finalization.

Overall, what I'm trying to argue for here is making this feature look
less like its own separate thing and more like part of the general
mechanism we've already got: partial paths would turn into regular
paths via generate_gather_paths(), and partially aggregated paths
would turn into fully aggregated paths by adding FinalizeAggregate.
The existing special case that allows us to build a non-partial, fully
aggregated path from a partial, partially-aggregated path would be
preserved.

I think this would probably eliminate some other problems I see in the
existing design as well. For example, create_partition_agg_paths
doesn't consider using Gather Merge, but that might be a win.

Append path is always non-sorted and has no pathkeys. Thus Gather Merge
over an Append path seems infeasible, isn't it?

With
the design above, I think you never need to call create_gather_path()
anywhere. In case #1, the existing code takes care of it. In the
special case mentioned under #2, if we chose to support that,
generate_gather_paths() would take care of it. Both of those places
already know about Gather Merge.

I don't understand how exactly, will have more careful look over this.

On another note, I found preferFullAgg to be wicked confusing. To
"prefer" something is to like it better, but be willing to accept
other options if the preference can't be accommodated. Here, it seems
like preferFullAgg = false prevents consideration of full aggregation.
So it's really more like allowFullAgg, or, maybe better,
try_full_aggregation. Also, try_partition_wise_grouping has a
variable isPartialAgg which is always ends up getting set to
!preferFullAgg. Having two Boolean variables which are always set to
the opposite of each other isn't good. To add to the confusion, the
code following the place where isPartialAgg is set sometimes refers to
isPartialAgg and sometimes refers to preferFullAgg.

I will have a look over this and commenting part.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Thanks

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

partially_grouped_rel.patchtext/x-patch; charset=US-ASCII; name=partially_grouped_rel.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2a4e22b..8b8a567 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -186,15 +186,18 @@ static PathTarget *make_sort_input_target(PlannerInfo *root,
 static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 					  List *targets, List *targets_contain_srfs);
 static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
-						  RelOptInfo *grouped_rel, PathTarget *target,
+						  RelOptInfo *grouped_rel,
+						  RelOptInfo *partially_grouped_rel,
+						  PathTarget *target,
 						  PathTarget *partial_grouping_target,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
 						  grouping_sets_data *gd, bool can_sort, bool can_hash,
 						  double dNumGroups, List *havingQual);
-static void add_partial_paths_to_grouping_rel(PlannerInfo *root,
+static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
 								  RelOptInfo *grouped_rel,
+								  RelOptInfo *partial_grouped_rel,
 								  PathTarget *target,
 								  PathTarget *partial_grouping_target,
 								  AggClauseCosts *agg_partial_costs,
@@ -3627,6 +3630,7 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *grouped_rel;
+	RelOptInfo *partially_grouped_rel;
 	PathTarget *partial_grouping_target = NULL;
 	AggClauseCosts agg_partial_costs;	/* parallel only */
 	AggClauseCosts agg_final_costs; /* parallel only */
@@ -3637,6 +3641,8 @@ create_grouping_paths(PlannerInfo *root,
 
 	/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
 	grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
+	partially_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
+											NULL);
 
 	/*
 	 * If the input relation is not parallel-safe, then the grouped relation
@@ -3817,7 +3823,8 @@ create_grouping_paths(PlannerInfo *root,
 								 &agg_final_costs);
 		}
 
-		add_partial_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+		add_paths_to_partial_grouping_rel(root, input_rel, grouped_rel,
+										  partially_grouped_rel, target,
 										  partial_grouping_target,
 										  &agg_partial_costs, &agg_final_costs,
 										  gd, can_sort, can_hash,
@@ -3825,7 +3832,8 @@ create_grouping_paths(PlannerInfo *root,
 	}
 
 	/* Build final grouping paths */
-	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+	add_paths_to_grouping_rel(root, input_rel, grouped_rel,
+							  partially_grouped_rel, target,
 							  partial_grouping_target, agg_costs,
 							  &agg_final_costs, gd, can_sort, can_hash,
 							  dNumGroups, (List *) parse->havingQual);
@@ -5859,7 +5867,9 @@ get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
  */
 static void
 add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
-						  RelOptInfo *grouped_rel, PathTarget *target,
+						  RelOptInfo *grouped_rel,
+						  RelOptInfo *partially_grouped_rel,
+						  PathTarget *target,
 						  PathTarget *partial_grouping_target,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
@@ -5870,6 +5880,12 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	ListCell   *lc;
 
+	/*
+	 * Parallel aggregation's partial paths must be stored in a
+	 * partially_grouped_rel and not in a grouped_rel.
+	 */
+	Assert(grouped_rel->partial_pathlist == NIL);
+
 	if (can_sort)
 	{
 		/*
@@ -5945,10 +5961,13 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 * Now generate a complete GroupAgg Path atop of the cheapest partial
 		 * path.  We can do this using either Gather or Gather Merge.
 		 */
-		if (grouped_rel->partial_pathlist)
+		if (partially_grouped_rel->partial_pathlist)
 		{
-			Path	   *path = (Path *) linitial(grouped_rel->partial_pathlist);
-			double		total_groups = path->rows * path->parallel_workers;
+			Path	   *path;
+			double		total_groups;
+
+			path = (Path *) linitial(partially_grouped_rel->partial_pathlist);
+			total_groups = path->rows * path->parallel_workers;
 
 			path = (Path *) create_gather_path(root,
 											   grouped_rel,
@@ -5999,7 +6018,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 			 */
 			if (parse->groupClause != NIL && root->group_pathkeys != NIL)
 			{
-				foreach(lc, grouped_rel->partial_pathlist)
+				foreach(lc, partially_grouped_rel->partial_pathlist)
 				{
 					Path	   *subpath = (Path *) lfirst(lc);
 					Path	   *gmpath;
@@ -6107,9 +6126,11 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 * again, we'll only do this if it looks as though the hash table
 		 * won't exceed work_mem.
 		 */
-		if (grouped_rel->partial_pathlist)
+		if (partially_grouped_rel->partial_pathlist)
 		{
-			Path	   *path = (Path *) linitial(grouped_rel->partial_pathlist);
+			Path	   *path;
+
+			path = (Path *) linitial(partially_grouped_rel->partial_pathlist);
 
 			hashaggtablesize = estimate_hashagg_tablesize(path,
 														  agg_final_costs,
@@ -6143,15 +6164,16 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 }
 
 /*
- * add_partial_paths_to_grouping_rel
+ * add_paths_to_partial_grouping_rel
  *
  * Add partial paths to grouping relation.  These paths are not fully
  * aggregated; a FinalizeAggregate step is still required.
  */
 static void
-add_partial_paths_to_grouping_rel(PlannerInfo *root,
+add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
 								  RelOptInfo *grouped_rel,
+								  RelOptInfo *partially_grouped_rel,
 								  PathTarget *target,
 								  PathTarget *partial_grouping_target,
 								  AggClauseCosts *agg_partial_costs,
@@ -6199,7 +6221,7 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 													 -1.0);
 
 				if (parse->hasAggs)
-					add_partial_path(grouped_rel, (Path *)
+					add_partial_path(partially_grouped_rel, (Path *)
 									 create_agg_path(root,
 													 grouped_rel,
 													 path,
@@ -6211,7 +6233,7 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 													 agg_partial_costs,
 													 dNumPartialGroups));
 				else
-					add_partial_path(grouped_rel, (Path *)
+					add_partial_path(partially_grouped_rel, (Path *)
 									 create_group_path(root,
 													   grouped_rel,
 													   path,
@@ -6239,7 +6261,7 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 		 */
 		if (hashaggtablesize < work_mem * 1024L)
 		{
-			add_partial_path(grouped_rel, (Path *)
+			add_partial_path(partially_grouped_rel, (Path *)
 							 create_agg_path(root,
 											 grouped_rel,
 											 cheapest_partial_path,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6bf68f3..a10b150 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,8 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_PARTIAL_GROUP_AGG,	/* result of partial grouping/aggregation, if
+								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
 	UPPERREL_WINDOW,			/* result of window functions, if any */
 	UPPERREL_DISTINCT,			/* result of "SELECT DISTINCT", if any */

#75

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#74)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Feb 1, 2018 at 8:59 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I wrote a patch for this (on current HEAD) and attached separately here.
Please have a look.

Yes, this is approximately what I had in mind, though it needs more
work (e.g. it doesn't removing the clearing of the grouped_rel's
partial_pathlist, which should no longer be necessary; also, it needs
substantial comment updates).

1. Parallel partition-wise aggregate, grouping key doesn't contain
partition key. This should just be a matter of adding additional
Append paths to partially_grouped_rel->partial_pathlist. The existing
code already knows how to stick a Gather and FinalizeAggregate step on
top of that, and I don't see why that logic would need any
modification or addition. An Append of child partial-grouping paths
should be producing the same output as a partial grouping of an
Append, except that the former case might produce more separate groups
that need to be merged; but that should be OK: we can just throw all
the paths into the same path list and let the cheapest one win.

For any partial aggregation we need to add finalization step after we are
done with the APPEND i.e. post add_paths_to_append_rel(). Given that we need
to replicate the logic of sticking Gather and FinalizeAggregate step at
later stage. This is what exactly done in create_partition_agg_paths().
Am I missing something here?

The problem is that create_partition_agg_paths() is doing *exactly*
same thing that add_paths_to_grouping_rel() is already doing inside
the blocks that say if (grouped_rel->partial_pathlist). We don't need
two copies of that code. Both of those places except to take a
partial path that has been partially aggregated and produce a
non-partial path that is fully aggregated. We do not need or want two
copies of that code.

Here's another way to look at it. We have four kinds of things.

1. Partially aggregated partial paths
2. Partially aggregated non-partial paths
3. Fully aggregated partial paths
4. Fully aggregated non-partial paths

The current code only ever generates paths of type #1 and #4; this
patch will add paths of type #2 as well, and maybe also type #3. But
the way you've got it, the existing paths of type #1 go into the
grouping_rel's partial_pathlist, and the new paths of type #1 go into
the OtherUpperPathExtraData's partial_paths list. Maybe there's a
good reason why we should keep them separate, but I'm inclined to
think they should all be going into the same list.

Overall, what I'm trying to argue for here is making this feature look
less like its own separate thing and more like part of the general
mechanism we've already got: partial paths would turn into regular
paths via generate_gather_paths(), and partially aggregated paths
would turn into fully aggregated paths by adding FinalizeAggregate.
The existing special case that allows us to build a non-partial, fully
aggregated path from a partial, partially-aggregated path would be
preserved.

I think this would probably eliminate some other problems I see in the
existing design as well. For example, create_partition_agg_paths
doesn't consider using Gather Merge, but that might be a win.

Append path is always non-sorted and has no pathkeys. Thus Gather Merge over
an Append path seems infeasible, isn't it?

We currently never generate an Append path with pathkeys, but we do
generate MergeAppend paths with pathkeys, as in the following example:

rhaas=# create table foo (a int, b text) partition by range (a);
CREATE TABLE
rhaas=# create index on foo (a);
CREATE INDEX
rhaas=# create table foo1 partition of foo for values from (0) to (1000000);
CREATE TABLE
rhaas=# create table foo2 partition of foo for values from (1000000)
to (2000000);
CREATE TABLE
rhaas=# select * from foo foo order by a;
a | b
---+---
(0 rows)
rhaas=# explain select * from foo foo order by a;
QUERY PLAN
----------------------------------------------------------------------------------------
Merge Append (cost=0.32..145.47 rows=2540 width=36)
Sort Key: foo.a
-> Index Scan using foo1_a_idx on foo1 foo (cost=0.15..63.20
rows=1270 width=36)
-> Index Scan using foo2_a_idx on foo2 foo_1 (cost=0.15..63.20
rows=1270 width=36)
(4 rows)

Actually, in this example, the MergeAppend could be safely converted
into an Append, because the partitions are in bound order, and
somebody already proposed a patch for that.

The point is that we want to be able to get plans like this:

Finalize GroupAggregate
-> Gather Merge
-> MergeAppend
-> Partial GroupAggregate
-> Parallel Index Scan on t1
-> Partial GroupAggregate
-> Parallel Index Scan on t2
-> Partial GroupAggregate
-> Parallel Index Scan on t3

If we only consider Gather, not Gather Merge, when turning a partially
aggregated partial path into a non-partial path, then we end up having
to insert a Sort node if we want to perform a Finalize GroupAggregate
step.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#76

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#75)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Feb 2, 2018 at 1:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 1, 2018 at 8:59 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I wrote a patch for this (on current HEAD) and attached separately here.
Please have a look.

Yes, this is approximately what I had in mind, though it needs more
work (e.g. it doesn't removing the clearing of the grouped_rel's
partial_pathlist, which should no longer be necessary; also, it needs
substantial comment updates).

That was just a quick patch to make sure is this what you meant.
Yes, it need some more work as you suggested and comment updates.

1. Parallel partition-wise aggregate, grouping key doesn't contain
partition key. This should just be a matter of adding additional
Append paths to partially_grouped_rel->partial_pathlist. The existing
code already knows how to stick a Gather and FinalizeAggregate step on
top of that, and I don't see why that logic would need any
modification or addition. An Append of child partial-grouping paths
should be producing the same output as a partial grouping of an
Append, except that the former case might produce more separate groups
that need to be merged; but that should be OK: we can just throw all
the paths into the same path list and let the cheapest one win.

For any partial aggregation we need to add finalization step after we are
done with the APPEND i.e. post add_paths_to_append_rel(). Given that we

need

to replicate the logic of sticking Gather and FinalizeAggregate step at
later stage. This is what exactly done in create_partition_agg_paths().
Am I missing something here?

The problem is that create_partition_agg_paths() is doing *exactly*
same thing that add_paths_to_grouping_rel() is already doing inside
the blocks that say if (grouped_rel->partial_pathlist). We don't need
two copies of that code. Both of those places except to take a
partial path that has been partially aggregated and produce a
non-partial path that is fully aggregated. We do not need or want two
copies of that code.

OK. Got it.

Will try to find a common place for them and will also check how it goes
with your suggested design change.

Here's another way to look at it. We have four kinds of things.

1. Partially aggregated partial paths
2. Partially aggregated non-partial paths
3. Fully aggregated partial paths
4. Fully aggregated non-partial paths

The current code only ever generates paths of type #1 and #4; this
patch will add paths of type #2 as well, and maybe also type #3. But
the way you've got it, the existing paths of type #1 go into the
grouping_rel's partial_pathlist, and the new paths of type #1 go into
the OtherUpperPathExtraData's partial_paths list. Maybe there's a
good reason why we should keep them separate, but I'm inclined to
think they should all be going into the same list.

The new paths are specific to partition-wise aggregates and I thought
better to keep them separately without interfering with grouped_rel
pathlist/partial_pathlist. And as you said, I didn't find a better place
that its own structure.

Overall, what I'm trying to argue for here is making this feature look
less like its own separate thing and more like part of the general
mechanism we've already got: partial paths would turn into regular
paths via generate_gather_paths(), and partially aggregated paths
would turn into fully aggregated paths by adding FinalizeAggregate.
The existing special case that allows us to build a non-partial, fully
aggregated path from a partial, partially-aggregated path would be
preserved.

I think this would probably eliminate some other problems I see in the
existing design as well. For example, create_partition_agg_paths
doesn't consider using Gather Merge, but that might be a win.

Append path is always non-sorted and has no pathkeys. Thus Gather Merge

over

an Append path seems infeasible, isn't it?

We currently never generate an Append path with pathkeys, but we do
generate MergeAppend paths with pathkeys, as in the following example:

rhaas=# create table foo (a int, b text) partition by range (a);
CREATE TABLE
rhaas=# create index on foo (a);
CREATE INDEX
rhaas=# create table foo1 partition of foo for values from (0) to
(1000000);
CREATE TABLE
rhaas=# create table foo2 partition of foo for values from (1000000)
to (2000000);
CREATE TABLE
rhaas=# select * from foo foo order by a;
a | b
---+---
(0 rows)
rhaas=# explain select * from foo foo order by a;
QUERY PLAN
------------------------------------------------------------
----------------------------
Merge Append (cost=0.32..145.47 rows=2540 width=36)
Sort Key: foo.a
-> Index Scan using foo1_a_idx on foo1 foo (cost=0.15..63.20
rows=1270 width=36)
-> Index Scan using foo2_a_idx on foo2 foo_1 (cost=0.15..63.20
rows=1270 width=36)
(4 rows)

Actually, in this example, the MergeAppend could be safely converted
into an Append, because the partitions are in bound order, and
somebody already proposed a patch for that.

The point is that we want to be able to get plans like this:

Finalize GroupAggregate
-> Gather Merge
-> MergeAppend
-> Partial GroupAggregate
-> Parallel Index Scan on t1
-> Partial GroupAggregate
-> Parallel Index Scan on t2
-> Partial GroupAggregate
-> Parallel Index Scan on t3

add_paths_to_append_rel() -> generate_mergeappend_paths() does not consider
partial_pathlist. Thus we will never see MergeAppend over parallel scan
given by partial_pathlist. And thus plan like:
-> Gather Merge
-> MergeAppend
is not possible with current HEAD.

Are you suggesting we should implement that here? I think that itself is a
separate task.

If we only consider Gather, not Gather Merge, when turning a partially
aggregated partial path into a non-partial path, then we end up having
to insert a Sort node if we want to perform a Finalize GroupAggregate
step.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#77

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#76)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Feb 2, 2018 at 8:25 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

The problem is that create_partition_agg_paths() is doing *exactly*
same thing that add_paths_to_grouping_rel() is already doing inside
the blocks that say if (grouped_rel->partial_pathlist). We don't need
two copies of that code. Both of those places except to take a
partial path that has been partially aggregated and produce a
non-partial path that is fully aggregated. We do not need or want two
copies of that code.

OK. Got it.

Will try to find a common place for them and will also check how it goes
with your suggested design change.

Here's another way to look at it. We have four kinds of things.

1. Partially aggregated partial paths
2. Partially aggregated non-partial paths
3. Fully aggregated partial paths
4. Fully aggregated non-partial paths

So in the new scheme I'm proposing, you've got a partially_grouped_rel
and a grouped_rel. So all paths of type #1 go into
partially_grouped_rel->partial_pathlist, paths of type #2 go into
partially_grouped_rel->pathlist, type #3 (if we have any) goes into
grouped_rel->partial_pathlist, and type #4 goes into
grouped_rel->pathlist.

add_paths_to_append_rel() -> generate_mergeappend_paths() does not consider
partial_pathlist. Thus we will never see MergeAppend over parallel scan
given by partial_pathlist. And thus plan like:
-> Gather Merge
-> MergeAppend
is not possible with current HEAD.

Are you suggesting we should implement that here? I think that itself is a
separate task.

Oh, I didn't realize that wasn't working already. I agree that it's a
separate task from this patch, but it's really too bad that it doesn't
already work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#78

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#77)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi,

In this attached version, I have rebased my changes over new design of
partially_grouped_rel. The preparatory changes of adding
partially_grouped_rel are in 0001.

Also to minimize finalization code duplication, I have refactored them into
two separate functions, finalize_sorted_partial_agg_path() and
finalize_hashed_partial_agg_path(). I need to create these two functions as
current path creation order in like,
Sort Agg Path
Sort Agg Path - Parallel Aware (Finalization needed here)
Hash Agg Path
Hash Agg Path - Parallel Aware (Finalization needed here)
And if we club those finalizations together, then path creation order will
be changed and it may result in the existing plan changes.
Let me know if that's OK, I will merge them together as they are distinct
anyways. These changes are part of 0002.

0003 - 0006 are refactoring patches as before.

0007 is the main patch per new design. I have removed
create_partition_agg_paths() altogether as finalization code is reused.
Also, renamed preferFullAgg with forcePartialAgg as we forcefully needed a
partial path from nested level if the parent is doing a partial
aggregation. add_single_path_to_append_rel() is no more exists and also
there is no need to pass OtherUpperPathExtraData to
add_paths_to_append_rel().

0008 - 0009, testcase and postgres_fdw changes.

Please have a look at new changes and let me know if I missed any.

Thanks

On Fri, Feb 2, 2018 at 7:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 2, 2018 at 8:25 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

The problem is that create_partition_agg_paths() is doing *exactly*
same thing that add_paths_to_grouping_rel() is already doing inside
the blocks that say if (grouped_rel->partial_pathlist). We don't need
two copies of that code. Both of those places except to take a
partial path that has been partially aggregated and produce a
non-partial path that is fully aggregated. We do not need or want two
copies of that code.

OK. Got it.

Will try to find a common place for them and will also check how it goes
with your suggested design change.

Here's another way to look at it. We have four kinds of things.

1. Partially aggregated partial paths
2. Partially aggregated non-partial paths
3. Fully aggregated partial paths
4. Fully aggregated non-partial paths

So in the new scheme I'm proposing, you've got a partially_grouped_rel
and a grouped_rel. So all paths of type #1 go into
partially_grouped_rel->partial_pathlist, paths of type #2 go into
partially_grouped_rel->pathlist, type #3 (if we have any) goes into
grouped_rel->partial_pathlist, and type #4 goes into
grouped_rel->pathlist.

add_paths_to_append_rel() -> generate_mergeappend_paths() does not

consider

partial_pathlist. Thus we will never see MergeAppend over parallel scan
given by partial_pathlist. And thus plan like:
-> Gather Merge
-> MergeAppend
is not possible with current HEAD.

Are you suggesting we should implement that here? I think that itself is

a

separate task.

Oh, I didn't realize that wasn't working already. I agree that it's a
separate task from this patch, but it's really too bad that it doesn't
already work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#79

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#78)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Feb 8, 2018 at 6:35 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

Hi,

In this attached version, I have rebased my changes over new design of
partially_grouped_rel. The preparatory changes of adding
partially_grouped_rel are in 0001.

Also to minimize finalization code duplication, I have refactored them
into two separate functions, finalize_sorted_partial_agg_path() and
finalize_hashed_partial_agg_path(). I need to create these two functions
as current path creation order in like,
Sort Agg Path
Sort Agg Path - Parallel Aware (Finalization needed here)
Hash Agg Path
Hash Agg Path - Parallel Aware (Finalization needed here)
And if we club those finalizations together, then path creation order will
be changed and it may result in the existing plan changes.
Let me know if that's OK, I will merge them together as they are distinct
anyways. These changes are part of 0002.

0003 - 0006 are refactoring patches as before.

0007 is the main patch per new design. I have removed
create_partition_agg_paths() altogether as finalization code is reused.
Also, renamed preferFullAgg with forcePartialAgg as we forcefully needed a
partial path from nested level if the parent is doing a partial
aggregation. add_single_path_to_append_rel() is no more exists and also
there is no need to pass OtherUpperPathExtraData to
add_paths_to_append_rel().

0008 - 0009, testcase and postgres_fdw changes.

Please have a look at new changes and let me know if I missed any.

Thanks

I was testing this patch for TPC-H benchmarking and came across following
results,

Q1 completes in 229 secs with patch and in 66 secs without it. It looks
like with this patch the time of parallel seq scan itself is elevated for
some of the partitions. Notice for partitions, lineitem_3, lineitem_7,
lineitem_10, and linietem_5 it is some 13 secs which was somewhere around 5
secs on head.

Q6 completes in some 7 secs with patch and it takes 4 secs without it. This
is mainly caused because with the new parallel append, the parallel
operator below it (parallel index scan in this case) is not used, however,
on head it was the append of all the parallel index scans, which was saving
quite some time.

Q18 takes some 390 secs with patch and some 147 secs without it.

The experimental setup for these tests is as follows,
work_mem = 500MB
shared_buffers = 10GB
effective_cache_size = 4GB
seq_page_cost = random+page_cost = 0.01
enable_partition_wise_join = off

Partitioning info:
Total 10 partitions on tables - lineitem and orders each with partitioning
key being l_orderkey and o_orderkey respectively.

Please find the attached file for explain analyse outputs of each of the
reported query.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#80

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Rafia Sabih (#79)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Feb 13, 2018 at 12:37 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
wrote:

I was testing this patch for TPC-H benchmarking and came across following
results,

Thanks Rafia for testing this with TPC-H benchmarking.

Q1 completes in 229 secs with patch and in 66 secs without it. It looks
like with this patch the time of parallel seq scan itself is elevated for
some of the partitions. Notice for partitions, lineitem_3, lineitem_7,
lineitem_10, and linietem_5 it is some 13 secs which was somewhere around 5
secs on head.

Q6 completes in some 7 secs with patch and it takes 4 secs without it.
This is mainly caused because with the new parallel append, the parallel
operator below it (parallel index scan in this case) is not used, however,
on head it was the append of all the parallel index scans, which was saving
quite some time.

I see that partition-wise aggregate plan too uses parallel index, am I
missing something?

Q18 takes some 390 secs with patch and some 147 secs without it.

This looks strange. This patch set does not touch parallel or seq scan as
such. I am not sure why this is happening. All these three queries explain
plan shows much higher execution time for parallel/seq scan.

However, do you see similar behaviour with patches applied,
"enable_partition_wise_agg = on" and "enable_partition_wise_agg = off" ?

Also, does rest of the queries perform better with partition-wise
aggregates?

The experimental setup for these tests is as follows,
work_mem = 500MB
shared_buffers = 10GB
effective_cache_size = 4GB
seq_page_cost = random+page_cost = 0.01
enable_partition_wise_join = off

Partitioning info:
Total 10 partitions on tables - lineitem and orders each with partitioning
key being l_orderkey and o_orderkey respectively.

Please find the attached file for explain analyse outputs of each of the
reported query.
--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#81

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#80)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Feb 13, 2018 at 6:21 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

I see that partition-wise aggregate plan too uses parallel index, am I
missing something?

You're right, I missed that, oops.

Q18 takes some 390 secs with patch and some 147 secs without it.

This looks strange. This patch set does not touch parallel or seq scan as
such. I am not sure why this is happening. All these three queries explain
plan shows much higher execution time for parallel/seq scan.

Yeah strange it is.

However, do you see similar behaviour with patches applied,
"enable_partition_wise_agg = on" and "enable_partition_wise_agg = off" ?

I tried that for query 18, with patch and enable_partition_wise_agg = off,
query completes in some 270 secs. You may find the explain analyse output
for it in the attached file. I noticed that on head the query plan had
parallel hash join however with patch and no partition-wise agg it is using
nested loop joins. This might be the issue.

Also, does rest of the queries perform better with partition-wise
aggregates?

As far as this setting goes, there wasn't any other query using
partition-wise-agg, so, no.

BTW, just an FYI, this experiment is on scale factor 20.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#82

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Rafia Sabih (#81)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Feb 14, 2018 at 12:17 PM, Rafia Sabih <rafia.sabih@enterprisedb.com>
wrote:

On Tue, Feb 13, 2018 at 6:21 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

I see that partition-wise aggregate plan too uses parallel index, am I
missing something?

You're right, I missed that, oops.

Q18 takes some 390 secs with patch and some 147 secs without it.

This looks strange. This patch set does not touch parallel or seq scan as
such. I am not sure why this is happening. All these three queries explain
plan shows much higher execution time for parallel/seq scan.

Yeah strange it is.

Off-list I have asked Rafia to provide me the perf machine access where she
is doing this bench-marking to see what's going wrong.
Thanks Rafia for the details.

What I have observed that, there are two sources, one with HEAD and other
with HEAD+PWA. However the configuration switches were different. Sources
with HEAD+PWA has CFLAGS="-ggdb3 -O0" CXXFLAGS="-ggdb3 -O0" flags in
addition with other sources. i.e. HEAD+PWA is configured with
debugging/optimization enabled which account for the slowness.

I have run EXPLAIN for these three queries on both the sources having
exactly same configuration switches and I don't find any slowness with PWA
patch-set.

Thus, it will be good if you re-run the benchmark by keeping configuration
switches same on both the sources and share the results.

Thanks

However, do you see similar behaviour with patches applied,

"enable_partition_wise_agg = on" and "enable_partition_wise_agg = off" ?

I tried that for query 18, with patch and enable_partition_wise_agg =
off, query completes in some 270 secs. You may find the explain analyse
output for it in the attached file. I noticed that on head the query plan
had parallel hash join however with patch and no partition-wise agg it is
using nested loop joins. This might be the issue.

Also, does rest of the queries perform better with partition-wise
aggregates?

As far as this setting goes, there wasn't any other query using
partition-wise-agg, so, no.

BTW, just an FYI, this experiment is on scale factor 20.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#83

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#78)

Re: [HACKERS] Partition-wise aggregation/grouping

Commit 2fb1abaeb016aeb45b9e6d0b81b7a7e92bb251b9, changed
enable_partition_wise_join to enable_partitionwise_join. This patch
too should use enable_partitionwise_agg instead of
enable_partition_wise_agg.

#84

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#78)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Feb 8, 2018 at 8:05 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

0003 - 0006 are refactoring patches as before.

I have committed 0006 with some modifications. In particular, [1] I
revised the comments and formatting; [2]was proposed upthread, but not adopted. I had the same thought while reading the patch (having forgotten the previous discussion) and that seemed like a good enough reason to do it according to the previous proposal. If there is a good reason to think MergeAppend needs that extra cost increment to be fairly-costed, I don't see it on this thread. I made cost_merge_append()
add cpu_tuple_cost * APPEND_CPU_COST_MULTIPLIER in lieu of, rather
than in addition to, cpu_operator_cost; and [3]was also remarked upon upthread -- Ashutosh mentioned that the change in plan shape was "sad" but there was no further discussion of the matter. I also found it sad; hence the change. This is, by the way, an interesting illustration of how partition-wise join could conceivably lose. Up until now I've thought that it seemed to be a slam dunk to always win or at least break even, but if you've got a relatively unselective join, such that the output is much larger than either input, then doing the join partition-wise means putting all of the output rows through an Append node, whereas doing it the normal way means putting only the input rows through Append nodes. If the smaller number of rows being joined at one time doesn't help -- e.g. all of the inner rows across all partitions fit in a tiny little hash table -- then we're just feeding more rows through the Append for no gain. Not a common case, perhaps, but not impossible. I modified the
regression test so that the overall plan shape didn't change.

[2]: was proposed upthread, but not adopted. I had the same thought while reading the patch (having forgotten the previous discussion) and that seemed like a good enough reason to do it according to the previous proposal. If there is a good reason to think MergeAppend needs that extra cost increment to be fairly-costed, I don't see it on this thread.
while reading the patch (having forgotten the previous discussion) and
that seemed like a good enough reason to do it according to the
previous proposal. If there is a good reason to think MergeAppend
needs that extra cost increment to be fairly-costed, I don't see it on
this thread.

[3]: was also remarked upon upthread -- Ashutosh mentioned that the change in plan shape was "sad" but there was no further discussion of the matter. I also found it sad; hence the change. This is, by the way, an interesting illustration of how partition-wise join could conceivably lose. Up until now I've thought that it seemed to be a slam dunk to always win or at least break even, but if you've got a relatively unselective join, such that the output is much larger than either input, then doing the join partition-wise means putting all of the output rows through an Append node, whereas doing it the normal way means putting only the input rows through Append nodes. If the smaller number of rows being joined at one time doesn't help -- e.g. all of the inner rows across all partitions fit in a tiny little hash table -- then we're just feeding more rows through the Append for no gain. Not a common case, perhaps, but not impossible.
change in plan shape was "sad" but there was no further discussion of
the matter. I also found it sad; hence the change. This is, by the
way, an interesting illustration of how partition-wise join could
conceivably lose. Up until now I've thought that it seemed to be a
slam dunk to always win or at least break even, but if you've got a
relatively unselective join, such that the output is much larger than
either input, then doing the join partition-wise means putting all of
the output rows through an Append node, whereas doing it the normal
way means putting only the input rows through Append nodes. If the
smaller number of rows being joined at one time doesn't help -- e.g.
all of the inner rows across all partitions fit in a tiny little hash
table -- then we're just feeding more rows through the Append for no
gain. Not a common case, perhaps, but not impossible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#85

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#78)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Feb 8, 2018 at 8:05 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

In this attached version, I have rebased my changes over new design of
partially_grouped_rel. The preparatory changes of adding
partially_grouped_rel are in 0001.

I spent today hacking in 0001; results attached. The big change from
your version is that this now uses generate_gather_paths() to add
Gather/Gather Merge nodes (except in the case where we sort by group
pathkeys and then Gather Merge) rather than keeping all of the bespoke
code. That turned up to be a bit less elegant than I would have liked
-- I had to an override_rows argument to generate_gather_paths to make
it work. But overall I think this is still a big improvement, since
it lets us share code instead of duplicating it. Also, it potentially
lets us add partially-aggregated but non-parallel paths into
partially_grouped_rel->pathlist and that should Just Work; they will
get the Finalize Aggregate step but not the Gather. With your
arrangement that wouldn't work.

Please review/test.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

partially-grouped-rel-rmh.patchapplication/octet-stream; name=partially-grouped-rel-rmh.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 57f0f594e5..0be2a73e05 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -268,7 +268,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				generate_partitionwise_join_paths(root, joinrel);
 
 				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				generate_gather_paths(root, joinrel, false);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f714247ebb..1c792a00eb 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -488,7 +488,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	 * we'll consider gathering partial paths for the parent appendrel.)
 	 */
 	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+		generate_gather_paths(root, rel, false);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2444,27 +2444,42 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * This must not be called until after we're done creating all partial paths
  * for the specified relation.  (Otherwise, add_partial_path might delete a
  * path that some GatherPath or GatherMergePath has a reference to.)
+ *
+ * If we're generating paths for a scan or join relation, override_rows will
+ * be false, and we'll just use the relation's size estimate.  When we're
+ * being called for a partially-grouped path, though, we need to override
+ * the rowcount estimate.  (It's not clear that the particular value we're
+ * using here is actually best, but the underlying rel has no estimate so
+ * we must do something.)
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
 	ListCell   *lc;
+	double		rows;
+	double	   *rowsp = NULL;
 
 	/* If there are no partial paths, there's nothing to do here. */
 	if (rel->partial_pathlist == NIL)
 		return;
 
+	/* Should we override the rel's rowcount estimate? */
+	if (override_rows)
+		rowsp = &rows;
+
 	/*
 	 * The output of Gather is always unsorted, so there's only one partial
 	 * path of interest: the cheapest one.  That will be the one at the front
 	 * of partial_pathlist because of the way add_partial_path works.
 	 */
 	cheapest_partial_path = linitial(rel->partial_pathlist);
+	rows =
+		cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
-						   NULL, NULL);
+						   NULL, rowsp);
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2479,8 +2494,9 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 		if (subpath->pathkeys == NIL)
 			continue;
 
+		rows = subpath->rows * subpath->parallel_workers;
 		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, NULL);
+										subpath->pathkeys, NULL, rowsp);
 		add_path(rel, &path->path);
 	}
 }
@@ -2653,7 +2669,7 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			generate_partitionwise_join_paths(root, rel);
 
 			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			generate_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd1447c..e4f9bd4c7f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -186,19 +186,17 @@ static PathTarget *make_sort_input_target(PlannerInfo *root,
 static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 					  List *targets, List *targets_contain_srfs);
 static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
-						  RelOptInfo *grouped_rel, PathTarget *target,
-						  PathTarget *partial_grouping_target,
+						  RelOptInfo *grouped_rel,
+						  PathTarget *target,
+						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
 						  grouping_sets_data *gd, bool can_sort, bool can_hash,
 						  double dNumGroups, List *havingQual);
-static void add_partial_paths_to_grouping_rel(PlannerInfo *root,
+static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
-								  RelOptInfo *grouped_rel,
-								  PathTarget *target,
-								  PathTarget *partial_grouping_target,
+								  RelOptInfo *partial_grouped_rel,
 								  AggClauseCosts *agg_partial_costs,
-								  AggClauseCosts *agg_final_costs,
 								  grouping_sets_data *gd,
 								  bool can_sort,
 								  bool can_hash,
@@ -3601,6 +3599,11 @@ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
  * create_grouping_paths
  *
  * Build a new upperrel containing Paths for grouping and/or aggregation.
+ * Along the way, we also build an upperrel for Paths which are partially
+ * grouped and/or aggregated.  A partially grouped and/or aggregated path
+ * needs a FinalizeAggregate node to complete the aggregation.  Currently,
+ * the only partially grouped paths we build are also partial paths; that
+ * is, they need a Gather and then a FinalizeAggregate.
  *
  * input_rel: contains the source-data Paths
  * target: the pathtarget for the result Paths to compute
@@ -3627,7 +3630,7 @@ create_grouping_paths(PlannerInfo *root,
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *grouped_rel;
-	PathTarget *partial_grouping_target = NULL;
+	RelOptInfo *partially_grouped_rel;
 	AggClauseCosts agg_partial_costs;	/* parallel only */
 	AggClauseCosts agg_final_costs; /* parallel only */
 	double		dNumGroups;
@@ -3635,26 +3638,41 @@ create_grouping_paths(PlannerInfo *root,
 	bool		can_sort;
 	bool		try_parallel_aggregation;
 
-	/* For now, do all work in the (GROUP_AGG, NULL) upperrel */
+	/*
+	 * For now, all aggregated paths are added to the (GROUP_AGG, NULL)
+	 * upperrel.  Paths that are only partially aggregated go into the
+	 * (UPPERREL_PARTIAL_GROUP_AGG, NULL) upperrel.
+	 */
 	grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
+	partially_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
+											NULL);
 
 	/*
 	 * If the input relation is not parallel-safe, then the grouped relation
 	 * can't be parallel-safe, either.  Otherwise, it's parallel-safe if the
-	 * target list and HAVING quals are parallel-safe.
+	 * target list and HAVING quals are parallel-safe.  The partially grouped
+	 * relation obeys the same rules.
 	 */
 	if (input_rel->consider_parallel &&
 		is_parallel_safe(root, (Node *) target->exprs) &&
 		is_parallel_safe(root, (Node *) parse->havingQual))
+	{
 		grouped_rel->consider_parallel = true;
+		partially_grouped_rel->consider_parallel = true;
+	}
 
 	/*
-	 * If the input rel belongs to a single FDW, so does the grouped rel.
+	 * If the input rel belongs to a single FDW, so does the grouped rel. Same
+	 * for the partially_grouped_rel.
 	 */
 	grouped_rel->serverid = input_rel->serverid;
 	grouped_rel->userid = input_rel->userid;
 	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 	grouped_rel->fdwroutine = input_rel->fdwroutine;
+	partially_grouped_rel->serverid = input_rel->serverid;
+	partially_grouped_rel->userid = input_rel->userid;
+	partially_grouped_rel->useridiscurrent = input_rel->useridiscurrent;
+	partially_grouped_rel->fdwroutine = input_rel->fdwroutine;
 
 	/*
 	 * Check for degenerate grouping.
@@ -3778,14 +3796,13 @@ create_grouping_paths(PlannerInfo *root,
 
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
-	 * partial paths; that way, later code can easily consider both parallel
-	 * and non-parallel approaches to grouping.  Note that the partial paths
-	 * we generate here are also partially aggregated, so simply pushing a
-	 * Gather node on top is insufficient to create a final path, as would be
-	 * the case for a scan/join rel.
+	 * partial paths for partially_grouped_rel; that way, later code can
+	 * easily consider both parallel and non-parallel approaches to grouping.
 	 */
 	if (try_parallel_aggregation)
 	{
+		PathTarget *partial_grouping_target;
+
 		/*
 		 * Build target list for partial aggregate paths.  These paths cannot
 		 * just emit the same tlist as regular aggregate paths, because (1) we
@@ -3794,6 +3811,7 @@ create_grouping_paths(PlannerInfo *root,
 		 * partial mode.
 		 */
 		partial_grouping_target = make_partial_grouping_target(root, target);
+		partially_grouped_rel->reltarget = partial_grouping_target;
 
 		/*
 		 * Collect statistics about aggregates for estimating costs of
@@ -3817,16 +3835,16 @@ create_grouping_paths(PlannerInfo *root,
 								 &agg_final_costs);
 		}
 
-		add_partial_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-										  partial_grouping_target,
-										  &agg_partial_costs, &agg_final_costs,
+		add_paths_to_partial_grouping_rel(root, input_rel,
+										  partially_grouped_rel,
+										  &agg_partial_costs,
 										  gd, can_sort, can_hash,
 										  (List *) parse->havingQual);
 	}
 
 	/* Build final grouping paths */
 	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-							  partial_grouping_target, agg_costs,
+							  partially_grouped_rel, agg_costs,
 							  &agg_final_costs, gd, can_sort, can_hash,
 							  dNumGroups, (List *) parse->havingQual);
 
@@ -3854,16 +3872,6 @@ create_grouping_paths(PlannerInfo *root,
 	/* Now choose the best path(s) */
 	set_cheapest(grouped_rel);
 
-	/*
-	 * We've been using the partial pathlist for the grouped relation to hold
-	 * partially aggregated paths, but that's actually a little bit bogus
-	 * because it's unsafe for later planning stages -- like ordered_rel ---
-	 * to get the idea that they can use these partial paths as if they didn't
-	 * need a FinalizeAggregate step.  Zap the partial pathlist at this stage
-	 * so we don't get confused.
-	 */
-	grouped_rel->partial_pathlist = NIL;
-
 	return grouped_rel;
 }
 
@@ -5996,8 +6004,9 @@ get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
  */
 static void
 add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
-						  RelOptInfo *grouped_rel, PathTarget *target,
-						  PathTarget *partial_grouping_target,
+						  RelOptInfo *grouped_rel,
+						  PathTarget *target,
+						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
 						  grouping_sets_data *gd, bool can_sort, bool can_hash,
@@ -6079,32 +6088,27 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		}
 
 		/*
-		 * Now generate a complete GroupAgg Path atop of the cheapest partial
-		 * path.  We can do this using either Gather or Gather Merge.
+		 * Instead of operating directly on the input relation, we can
+		 * consider finalizing a partially aggregated path.
 		 */
-		if (grouped_rel->partial_pathlist)
+		foreach(lc, partially_grouped_rel->pathlist)
 		{
-			Path	   *path = (Path *) linitial(grouped_rel->partial_pathlist);
-			double		total_groups = path->rows * path->parallel_workers;
-
-			path = (Path *) create_gather_path(root,
-											   grouped_rel,
-											   path,
-											   partial_grouping_target,
-											   NULL,
-											   &total_groups);
+			Path	   *path = (Path *) lfirst(lc);
 
 			/*
-			 * Since Gather's output is always unsorted, we'll need to sort,
-			 * unless there's no GROUP BY clause or a degenerate (constant)
-			 * one, in which case there will only be a single group.
+			 * Insert a Sort node, if required.  But there's no point in
+			 * sorting anything but the cheapest path.
 			 */
-			if (root->group_pathkeys)
+			if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+			{
+				if (path != linitial(partially_grouped_rel->pathlist))
+					continue;
 				path = (Path *) create_sort_path(root,
 												 grouped_rel,
 												 path,
 												 root->group_pathkeys,
 												 -1.0);
+			}
 
 			if (parse->hasAggs)
 				add_path(grouped_rel, (Path *)
@@ -6127,70 +6131,6 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 										   parse->groupClause,
 										   havingQual,
 										   dNumGroups));
-
-			/*
-			 * The point of using Gather Merge rather than Gather is that it
-			 * can preserve the ordering of the input path, so there's no
-			 * reason to try it unless (1) it's possible to produce more than
-			 * one output row and (2) we want the output path to be ordered.
-			 */
-			if (parse->groupClause != NIL && root->group_pathkeys != NIL)
-			{
-				foreach(lc, grouped_rel->partial_pathlist)
-				{
-					Path	   *subpath = (Path *) lfirst(lc);
-					Path	   *gmpath;
-					double		total_groups;
-
-					/*
-					 * It's useful to consider paths that are already properly
-					 * ordered for Gather Merge, because those don't need a
-					 * sort. It's also useful to consider the cheapest path,
-					 * because sorting it in parallel and then doing Gather
-					 * Merge may be better than doing an unordered Gather
-					 * followed by a sort. But there's no point in considering
-					 * non-cheapest paths that aren't already sorted
-					 * correctly.
-					 */
-					if (path != subpath &&
-						!pathkeys_contained_in(root->group_pathkeys,
-											   subpath->pathkeys))
-						continue;
-
-					total_groups = subpath->rows * subpath->parallel_workers;
-
-					gmpath = (Path *)
-						create_gather_merge_path(root,
-												 grouped_rel,
-												 subpath,
-												 partial_grouping_target,
-												 root->group_pathkeys,
-												 NULL,
-												 &total_groups);
-
-					if (parse->hasAggs)
-						add_path(grouped_rel, (Path *)
-								 create_agg_path(root,
-												 grouped_rel,
-												 gmpath,
-												 target,
-												 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
-												 AGGSPLIT_FINAL_DESERIAL,
-												 parse->groupClause,
-												 havingQual,
-												 agg_final_costs,
-												 dNumGroups));
-					else
-						add_path(grouped_rel, (Path *)
-								 create_group_path(root,
-												   grouped_rel,
-												   gmpath,
-												   target,
-												   parse->groupClause,
-												   havingQual,
-												   dNumGroups));
-				}
-			}
 		}
 	}
 
@@ -6240,29 +6180,21 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		}
 
 		/*
-		 * Generate a HashAgg Path atop of the cheapest partial path. Once
-		 * again, we'll only do this if it looks as though the hash table
-		 * won't exceed work_mem.
+		 * Generate a Finalize HashAgg Path atop of the cheapest partially
+		 * grouped path. Once again, we'll only do this if it looks as though
+		 * the hash table won't exceed work_mem.
 		 */
-		if (grouped_rel->partial_pathlist)
+		if (partially_grouped_rel->pathlist)
 		{
-			Path	   *path = (Path *) linitial(grouped_rel->partial_pathlist);
+			Path	   *path;
+
+			path = (Path *) linitial(partially_grouped_rel->pathlist);
 
 			hashaggtablesize = estimate_hashagg_tablesize(path,
 														  agg_final_costs,
 														  dNumGroups);
 
 			if (hashaggtablesize < work_mem * 1024L)
-			{
-				double		total_groups = path->rows * path->parallel_workers;
-
-				path = (Path *) create_gather_path(root,
-												   grouped_rel,
-												   path,
-												   partial_grouping_target,
-												   NULL,
-												   &total_groups);
-
 				add_path(grouped_rel, (Path *)
 						 create_agg_path(root,
 										 grouped_rel,
@@ -6274,25 +6206,24 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 										 havingQual,
 										 agg_final_costs,
 										 dNumGroups));
-			}
 		}
 	}
 }
 
 /*
- * add_partial_paths_to_grouping_rel
+ * add_paths_to_partial_grouping_rel
  *
- * Add partial paths to grouping relation.  These paths are not fully
- * aggregated; a FinalizeAggregate step is still required.
+ * First, generate partially aggregated partial paths from the partial paths
+ * for the input relation, and then generate partially aggregated non-partial
+ * paths using Gather or Gather Merge.  All paths for this relation -- both
+ * partial and non-partial -- have been partially aggregated but require a
+ * subsequent FinalizeAggregate step.
  */
 static void
-add_partial_paths_to_grouping_rel(PlannerInfo *root,
+add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
-								  RelOptInfo *grouped_rel,
-								  PathTarget *target,
-								  PathTarget *partial_grouping_target,
+								  RelOptInfo *partially_grouped_rel,
 								  AggClauseCosts *agg_partial_costs,
-								  AggClauseCosts *agg_final_costs,
 								  grouping_sets_data *gd,
 								  bool can_sort,
 								  bool can_hash,
@@ -6330,17 +6261,17 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 				/* Sort the cheapest partial path, if it isn't already */
 				if (!is_sorted)
 					path = (Path *) create_sort_path(root,
-													 grouped_rel,
+													 partially_grouped_rel,
 													 path,
 													 root->group_pathkeys,
 													 -1.0);
 
 				if (parse->hasAggs)
-					add_partial_path(grouped_rel, (Path *)
+					add_partial_path(partially_grouped_rel, (Path *)
 									 create_agg_path(root,
-													 grouped_rel,
+													 partially_grouped_rel,
 													 path,
-													 partial_grouping_target,
+													 partially_grouped_rel->reltarget,
 													 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
 													 AGGSPLIT_INITIAL_SERIAL,
 													 parse->groupClause,
@@ -6348,11 +6279,11 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 													 agg_partial_costs,
 													 dNumPartialGroups));
 				else
-					add_partial_path(grouped_rel, (Path *)
+					add_partial_path(partially_grouped_rel, (Path *)
 									 create_group_path(root,
-													   grouped_rel,
+													   partially_grouped_rel,
 													   path,
-													   partial_grouping_target,
+													   partially_grouped_rel->reltarget,
 													   parse->groupClause,
 													   NIL,
 													   dNumPartialGroups));
@@ -6376,11 +6307,11 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 		 */
 		if (hashaggtablesize < work_mem * 1024L)
 		{
-			add_partial_path(grouped_rel, (Path *)
+			add_partial_path(partially_grouped_rel, (Path *)
 							 create_agg_path(root,
-											 grouped_rel,
+											 partially_grouped_rel,
 											 cheapest_partial_path,
-											 partial_grouping_target,
+											 partially_grouped_rel->reltarget,
 											 AGG_HASHED,
 											 AGGSPLIT_INITIAL_SERIAL,
 											 parse->groupClause,
@@ -6389,6 +6320,55 @@ add_partial_paths_to_grouping_rel(PlannerInfo *root,
 											 dNumPartialGroups));
 		}
 	}
+
+	/*
+	 * If there is an FDW that's responsible for all baserels of the query,
+	 * let it consider adding partially grouped ForeignPaths.
+	 */
+	if (partially_grouped_rel->fdwroutine &&
+		partially_grouped_rel->fdwroutine->GetForeignUpperPaths)
+	{
+		FdwRoutine *fdwroutine = partially_grouped_rel->fdwroutine;
+
+		fdwroutine->GetForeignUpperPaths(root,
+										 UPPERREL_PARTIAL_GROUP_AGG,
+										 input_rel, partially_grouped_rel);
+	}
+
+	/*
+	 * Try adding Gather or Gather Merge to partial paths to produce
+	 * non-partial paths.
+	 */
+	generate_gather_paths(root, partially_grouped_rel, true);
+
+	/*
+	 * generate_gather_paths won't consider sorting the cheapest path to match
+	 * the group keys and then applying a Gather Merge node to the result;
+	 * that might be a winning strategy.
+	 */
+	if (!pathkeys_contained_in(root->group_pathkeys,
+							   cheapest_partial_path->pathkeys))
+	{
+		Path	   *path;
+		double		total_groups;
+
+		total_groups =
+			cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
+		path = (Path *) create_sort_path(root, partially_grouped_rel,
+										 cheapest_partial_path,
+										 root->group_pathkeys,
+										 -1.0);
+		path = (Path *)
+			create_gather_merge_path(root,
+									 partially_grouped_rel,
+									 path,
+									 partially_grouped_rel->reltarget,
+									 root->group_pathkeys,
+									 NULL,
+									 &total_groups);
+
+		add_path(partially_grouped_rel, path);
+	}
 }
 
 /*
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index b1c63173c2..db8de2dfd0 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,8 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
+								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
 	UPPERREL_WINDOW,			/* result of window functions, if any */
 	UPPERREL_DISTINCT,			/* result of "SELECT DISTINCT", if any */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index c9e44318ad..520e3583c9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+		bool override_rows);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,

#86

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#85)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi Robert,

On Fri, Feb 23, 2018 at 2:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 8, 2018 at 8:05 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

In this attached version, I have rebased my changes over new design of
partially_grouped_rel. The preparatory changes of adding
partially_grouped_rel are in 0001.

I spent today hacking in 0001; results attached. The big change from
your version is that this now uses generate_gather_paths() to add
Gather/Gather Merge nodes (except in the case where we sort by group
pathkeys and then Gather Merge) rather than keeping all of the bespoke
code. That turned up to be a bit less elegant than I would have liked
-- I had to an override_rows argument to generate_gather_paths to make
it work. But overall I think this is still a big improvement, since
it lets us share code instead of duplicating it. Also, it potentially
lets us add partially-aggregated but non-parallel paths into
partially_grouped_rel->pathlist and that should Just Work; they will
get the Finalize Aggregate step but not the Gather. With your
arrangement that wouldn't work.

Please review/test.

I have reviewed and tested the patch and here are my couple of points:

     /*
-     * If the input rel belongs to a single FDW, so does the grouped rel.
+     * If the input rel belongs to a single FDW, so does the grouped rel.
Same
+     * for the partially_grouped_rel.
      */
     grouped_rel->serverid = input_rel->serverid;
     grouped_rel->userid = input_rel->userid;
     grouped_rel->useridiscurrent = input_rel->useridiscurrent;
     grouped_rel->fdwroutine = input_rel->fdwroutine;
+    partially_grouped_rel->serverid = input_rel->serverid;
+    partially_grouped_rel->userid = input_rel->userid;
+    partially_grouped_rel->useridiscurrent = input_rel->useridiscurrent;
+    partially_grouped_rel->fdwroutine = input_rel->fdwroutine;

In my earlier mail where I have posted a patch for this partially grouped
rel changes, I forgot to put my question on this.
I was unclear about above changes and thus passed grouped_rel whenever we
wanted to work on partially_grouped_rel to fetch relevant details.

One idea I thought about is to memcpy the struct once we have set all
required fields for grouped_rel so that we don't have to do similar stuff
for partially_grouped_rel.

---

+             * Insert a Sort node, if required.  But there's no point in
+             * sorting anything but the cheapest path.
              */
-            if (root->group_pathkeys)
+            if (!pathkeys_contained_in(root->group_pathkeys,
path->pathkeys))
+            {
+                if (path != linitial(partially_grouped_rel->pathlist))
+                    continue;

Paths in pathlist are added by add_path(). Though we have paths is pathlist
is sorted with the cheapest total path, we generally use
RelOptInfo->cheapest_total_path instead of using first entry, unlike
partial paths. But here you use the first entry like partial paths case.
Will it better to use cheapest total path from partially_grouped_rel? This
will require calling set_cheapest on partially_grouped_rel before we call
this function.

Attached top-up patch doing this along with few indentation fixes.

Rest of the changes look good to me.

Once this gets in, I will re-base my other patches accordingly.

And, thanks for committing 0006.

Thanks

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

use_cheapest_total_path.patchtext/x-patch; charset=US-ASCII; name=use_cheapest_total_path.patchDownload

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a0..f6b0208 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2453,7 +2453,8 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * we must do something.)
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  bool override_rows)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e4f9bd4..e8f6cc5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6101,7 +6101,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 			 */
 			if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
 			{
-				if (path != linitial(partially_grouped_rel->pathlist))
+				if (path != partially_grouped_rel->cheapest_total_path)
 					continue;
 				path = (Path *) create_sort_path(root,
 												 grouped_rel,
@@ -6186,9 +6186,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 */
 		if (partially_grouped_rel->pathlist)
 		{
-			Path	   *path;
-
-			path = (Path *) linitial(partially_grouped_rel->pathlist);
+			Path	   *path = partially_grouped_rel->cheapest_total_path;
 
 			hashaggtablesize = estimate_hashagg_tablesize(path,
 														  agg_final_costs,
@@ -6369,6 +6367,9 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 
 		add_path(partially_grouped_rel, path);
 	}
+
+	/* Now choose the best path(s) */
+	set_cheapest(partially_grouped_rel);
 }
 
 /*
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 2011f66..94f9bb2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -54,7 +54,7 @@ extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
 extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
-		bool override_rows);
+					  bool override_rows);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,

#87

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#86)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Feb 26, 2018 at 6:38 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

One idea I thought about is to memcpy the struct once we have set all
required fields for grouped_rel so that we don't have to do similar stuff
for partially_grouped_rel.

I think that would be a poor idea. We want to copy a few specific
fields, not everything, and copying those fields is cheap, because
they are just simple assignment statements. I think memcpy()'ing the
whole structure would be using a sledgehammer to solve a problem for
which a scalpel is more suited.

Paths in pathlist are added by add_path(). Though we have paths is pathlist
is sorted with the cheapest total path, we generally use
RelOptInfo->cheapest_total_path instead of using first entry, unlike partial
paths. But here you use the first entry like partial paths case. Will it
better to use cheapest total path from partially_grouped_rel? This will
require calling set_cheapest on partially_grouped_rel before we call this
function.

Hmm, I guess that seems like a reasonable approach, although I am not
sure it matters much either way.

Attached top-up patch doing this along with few indentation fixes.

I don't see much point to the change in generate_gather_paths -- that
line is only 77 characters long.

Committed after incorporating your other fixes and updating the
optimizer README.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#88

Rafia Sabih

rafia.sabih@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#82)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Feb 14, 2018 at 8:35 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

On Wed, Feb 14, 2018 at 12:17 PM, Rafia Sabih <
rafia.sabih@enterprisedb.com> wrote:

On Tue, Feb 13, 2018 at 6:21 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

I see that partition-wise aggregate plan too uses parallel index, am I
missing something?

You're right, I missed that, oops.

Q18 takes some 390 secs with patch and some 147 secs without it.

This looks strange. This patch set does not touch parallel or seq scan
as such. I am not sure why this is happening. All these three queries
explain plan shows much higher execution time for parallel/seq scan.

Yeah strange it is.

Off-list I have asked Rafia to provide me the perf machine access where
she is doing this bench-marking to see what's going wrong.
Thanks Rafia for the details.

What I have observed that, there are two sources, one with HEAD and other
with HEAD+PWA. However the configuration switches were different. Sources
with HEAD+PWA has CFLAGS="-ggdb3 -O0" CXXFLAGS="-ggdb3 -O0" flags in
addition with other sources. i.e. HEAD+PWA is configured with
debugging/optimization enabled which account for the slowness.

I have run EXPLAIN for these three queries on both the sources having
exactly same configuration switches and I don't find any slowness with PWA
patch-set.

Thus, it will be good if you re-run the benchmark by keeping configuration
switches same on both the sources and share the results.

Thanks

Interesting. I checked with keeping configure flags same for both the

repos,
on head:
CONFIGURE = '--enable-cassert' 'CFLAGS=-ggdb3 -O0' 'CXXFLAGS=-ggdb3 -O0'
'prefix=/data/rafia.sabih/pg_head/install/'
On head+PWA:
CONFIGURE = '--enable-cassert' 'CFLAGS=-ggdb3 -O0' 'CXXFLAGS=-ggdb3 -O0'
'prefix=/data/rafia.sabih/pg_part_pa/install/'

The queries I previously reported are now performing same as on head with
the above mentioned configuration. However, I further experimented with
partitionwise_join set to true, and found following cases of regression,
Q17 was taking some 1400 secs on head but with PWA it's taking some 1600
secs, looks like append of scan+aggregates is coming to be costlier than
that of just scan.
Q20 took 470 secs on head and with PWA it's taking 630 secs, the execution
plan is changed a lot, one thing in particular with the patch is not using
parallel bitmap heap scan on lineitem table.

The experimental settings were kept same as before with the change of
partitionwise_join = 1.

Please find the attached zip for the explain analyse outputs.

However, do you see similar behaviour with patches applied,

"enable_partition_wise_agg = on" and "enable_partition_wise_agg = off" ?

I tried that for query 18, with patch and enable_partition_wise_agg =
off, query completes in some 270 secs. You may find the explain analyse
output for it in the attached file. I noticed that on head the query plan
had parallel hash join however with patch and no partition-wise agg it is
using nested loop joins. This might be the issue.

Also, does rest of the queries perform better with partition-wise
aggregates?

As far as this setting goes, there wasn't any other query using
partition-wise-agg, so, no.

BTW, just an FYI, this experiment is on scale factor 20.

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

#89

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#87)

4 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi Robert,

On Mon, Feb 26, 2018 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Committed after incorporating your other fixes and updating the
optimizer README.

Thanks Robert.

Off-list Rajkumar has reported an issue. When we have enable_hashagg set to
false, and Gather Merge path is chosen, it ended-up in an error saying
"ERROR: Aggref found in non-Agg plan node".

I had a look over his provided testcase and observed that when we create a
Gather Merge path over a cheapest partial path by sorting it explicitly as
generate_gather_paths won't consider it, we accidentally used cheapest
partial path from the input_rel to create a Gather Merge; instead we need a
cheapest partial path from the partially_grouped_rel.

Attached fix_aggref_in_non-agg_error.patch fixing this.
test_for_aggref_in_non-agg_error.patch has a testcase reported by Rajkumar
which I have added in a aggregates.sql.

While doing so, I have observed few cleanup changes, added those in
misc_cleanup.patch.

---

While re-basing my partitionwise aggregate changes, I observed that when we
want to create partial aggregation paths for a child partition, we don't
need to add Gather or Gather Merge on top of it as we first want to append
them all and then want to stick a gather on it. So it will be better to
have that code part in a separate function so that we can call it from
required places.

I have attached patch (create_non_partial_paths.patch) for it including all
above fix.

Thanks

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

fix_aggref_in_non-agg_error.patchtext/x-patch; charset=US-ASCII; name=fix_aggref_in_non-agg_error.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e8f6cc5..8190675 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6339,6 +6339,9 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 	 */
 	generate_gather_paths(root, partially_grouped_rel, true);
 
+	/* Get cheapest partial path from partially_grouped_rel */
+	cheapest_partial_path = linitial(partially_grouped_rel->partial_pathlist);
+
 	/*
 	 * generate_gather_paths won't consider sorting the cheapest path to match
 	 * the group keys and then applying a Gather Merge node to the result;

test_for_aggref_in_non-agg_error.patchtext/x-patch; charset=US-ASCII; name=test_for_aggref_in_non-agg_error.patchDownload

diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f85e913..fa56f6b 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2065,3 +2065,48 @@ SELECT balk(hundred) FROM tenk1;
 (1 row)
 
 ROLLBACK;
+-- Test for partially_grouped_rel
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET enable_hashagg TO false;
+CREATE TABLE pagg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE pagg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO pagg_tab1 SELECT i%30, i FROM generate_series(0, 29) i;
+INSERT INTO pagg_tab2 SELECT i, i%30 FROM generate_series(0, 29) i;
+ANALYZE pagg_tab1;
+ANALYZE pagg_tab2;
+EXPLAIN (COSTS OFF)
+SELECT t1.y, sum(t1.x), COUNT(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y ORDER BY 1, 2, 3;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Sort
+   Sort Key: t1.y, (sum(t1.x)), (count(*))
+   ->  Finalize GroupAggregate
+         Group Key: t1.y
+         ->  Gather Merge
+               Workers Planned: 2
+               ->  Partial GroupAggregate
+                     Group Key: t1.y
+                     ->  Sort
+                           Sort Key: t1.y
+                           ->  Parallel Hash Join
+                                 Hash Cond: (t1.x = t2.y)
+                                 ->  Parallel Append
+                                       ->  Parallel Seq Scan on pagg_tab1_p1 t1
+                                       ->  Parallel Seq Scan on pagg_tab1_p2 t1_1
+                                       ->  Parallel Seq Scan on pagg_tab1_p3 t1_2
+                                 ->  Parallel Hash
+                                       ->  Parallel Append
+                                             ->  Parallel Seq Scan on pagg_tab2_p1 t2
+                                             ->  Parallel Seq Scan on pagg_tab2_p2 t2_1
+                                             ->  Parallel Seq Scan on pagg_tab2_p3 t2_2
+(21 rows)
+
+DROP TABLE pagg_tab2;
+DROP TABLE pagg_tab1;
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 506d044..17c25e5 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -907,3 +907,30 @@ EXPLAIN (COSTS OFF) SELECT balk(hundred) FROM tenk1;
 SELECT balk(hundred) FROM tenk1;
 
 ROLLBACK;
+
+-- Test for partially_grouped_rel
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET enable_hashagg TO false;
+
+CREATE TABLE pagg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30);
+
+CREATE TABLE pagg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30);
+
+INSERT INTO pagg_tab1 SELECT i%30, i FROM generate_series(0, 29) i;
+INSERT INTO pagg_tab2 SELECT i, i%30 FROM generate_series(0, 29) i;
+
+ANALYZE pagg_tab1;
+ANALYZE pagg_tab2;
+
+EXPLAIN (COSTS OFF)
+SELECT t1.y, sum(t1.x), COUNT(*) FROM pagg_tab1 t1, pagg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.y ORDER BY 1, 2, 3;
+
+DROP TABLE pagg_tab2;
+DROP TABLE pagg_tab1;

misc_cleanup.patchtext/x-patch; charset=US-ASCII; name=misc_cleanup.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e8f6cc5..107d5f3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -195,12 +195,11 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  double dNumGroups, List *havingQual);
 static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
-								  RelOptInfo *partial_grouped_rel,
+								  RelOptInfo *partially_grouped_rel,
 								  AggClauseCosts *agg_partial_costs,
 								  grouping_sets_data *gd,
 								  bool can_sort,
-								  bool can_hash,
-								  List *havingQual);
+								  bool can_hash);
 static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
 
@@ -3838,8 +3837,7 @@ create_grouping_paths(PlannerInfo *root,
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel,
 										  &agg_partial_costs,
-										  gd, can_sort, can_hash,
-										  (List *) parse->havingQual);
+										  gd, can_sort, can_hash);
 	}
 
 	/* Build final grouping paths */
@@ -6224,8 +6222,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  AggClauseCosts *agg_partial_costs,
 								  grouping_sets_data *gd,
 								  bool can_sort,
-								  bool can_hash,
-								  List *havingQual)
+								  bool can_hash)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_partial_path = linitial(input_rel->partial_pathlist);

create_non_partial_paths.patchtext/x-patch; charset=US-ASCII; name=create_non_partial_paths.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e8f6cc5..8ceef22 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -195,12 +195,13 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  double dNumGroups, List *havingQual);
 static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
-								  RelOptInfo *partial_grouped_rel,
+								  RelOptInfo *partially_grouped_rel,
 								  AggClauseCosts *agg_partial_costs,
 								  grouping_sets_data *gd,
 								  bool can_sort,
-								  bool can_hash,
-								  List *havingQual);
+								  bool can_hash);
+static void create_non_partial_paths(PlannerInfo *root,
+						 RelOptInfo *partially_grouped_rel);
 static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
 
@@ -3838,8 +3839,10 @@ create_grouping_paths(PlannerInfo *root,
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel,
 										  &agg_partial_costs,
-										  gd, can_sort, can_hash,
-										  (List *) parse->havingQual);
+										  gd, can_sort, can_hash);
+
+		/* Add Gather or Gather Merge atop cheapest partial path. */
+		create_non_partial_paths(root, partially_grouped_rel);
 	}
 
 	/* Build final grouping paths */
@@ -6224,8 +6227,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  AggClauseCosts *agg_partial_costs,
 								  grouping_sets_data *gd,
 								  bool can_sort,
-								  bool can_hash,
-								  List *havingQual)
+								  bool can_hash)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_partial_path = linitial(input_rel->partial_pathlist);
@@ -6332,13 +6334,26 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 										 UPPERREL_PARTIAL_GROUP_AGG,
 										 input_rel, partially_grouped_rel);
 	}
+}
 
-	/*
-	 * Try adding Gather or Gather Merge to partial paths to produce
-	 * non-partial paths.
-	 */
+/*
+ * create_non_partial_paths
+ *
+ * Try adding Gather or Gather Merge to partial paths to produce non-partial
+ * paths.
+ */
+static void
+create_non_partial_paths(PlannerInfo *root,
+						 RelOptInfo *partially_grouped_rel)
+{
+	Path	   *cheapest_partial_path;
+
+	/* Gather all partial paths */
 	generate_gather_paths(root, partially_grouped_rel, true);
 
+	/* Get cheapest partial path from partially_grouped_rel */
+	cheapest_partial_path = linitial(partially_grouped_rel->partial_pathlist);
+
 	/*
 	 * generate_gather_paths won't consider sorting the cheapest path to match
 	 * the group keys and then applying a Gather Merge node to the result;

#90

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#89)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Feb 27, 2018 at 4:29 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Hi Robert,
I had a look over his provided testcase and observed that when we create a
Gather Merge path over a cheapest partial path by sorting it explicitly as
generate_gather_paths won't consider it, we accidentally used cheapest
partial path from the input_rel to create a Gather Merge; instead we need a
cheapest partial path from the partially_grouped_rel.

Attached fix_aggref_in_non-agg_error.patch fixing this.

Oops. Thanks, committed.

test_for_aggref_in_non-agg_error.patch has a testcase reported by Rajkumar
which I have added in a aggregates.sql.

Didn't commit this; I think that's overkill.

While doing so, I have observed few cleanup changes, added those in
misc_cleanup.patch.

Committed those.

While re-basing my partitionwise aggregate changes, I observed that when we
want to create partial aggregation paths for a child partition, we don't
need to add Gather or Gather Merge on top of it as we first want to append
them all and then want to stick a gather on it. So it will be better to have
that code part in a separate function so that we can call it from required
places.

I have attached patch (create_non_partial_paths.patch) for it including all
above fix.

I don't like that very much. For one thing, the name
create_non_partial_paths() is not very descriptive at all. For
another thing, it somewhat renders add_paths_to_partial_grouping_rel()
a misnomer, as that function then adds only partial paths. I think
what you should just do is have the main patch add a test for
rel->reloptkind == RELOPT_UPPER_REL; if true, add the Gather paths; if
not, skip it. Then it will be skipped for RELOPT_OTHER_UPPER_REL
which is what we want.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#91

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#87)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Feb 26, 2018 at 8:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Committed after incorporating your other fixes and updating the
optimizer README.

Attached new patchset after rebasing my changes over these changes and on
latest HEAD.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Thanks

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#92

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#91)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 1, 2018 at 5:34 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new patchset after rebasing my changes over these changes and on
latest HEAD.

+        * We have already created a Gather or Gather Merge path atop cheapest
+        * partial path. Thus the partial path referenced by the
Gather node needs
+        * to be preserved as adding new partial paths in same rel may
delete this
+        * referenced path. To do this we need to clear the
partial_pathlist from
+        * the partially_grouped_rel as we may add partial paths again
while doing
+        * partitionwise aggregation. Keeping older partial path intact seems
+        * reasonable too as it might possible that the final path
chosen which is
+        * using it wins, but the underneath partial path is not the
cheapest one.

This isn't a good design. You shouldn't create a Gather or Gather
Merge node until all partial paths have been added. I mean, the point
is to put a Gather node on top of the cheapest path, not the path that
is currently the cheapest but might not actually be the cheapest once
we've added them all.

+add_gather_or_gather_merge(PlannerInfo *root,

Please stop picking generic function names for functions that have
very specific purposes. I don't really think that you need this to be
a separate function at all, but it it is certainly NOT a
general-purpose function for adding a Gather or Gather Merge node.

+        /*
+         * Collect statistics about aggregates for estimating costs of
+         * performing aggregation in parallel or in partial, if not already
+         * done. We use same cost for all the children as they will be same
+         * anyways.
+         */

If it only needs to be done once, do we really have to have it inside
the loop? I see that you're using the varno-translated
partial_target->exprs and target->exprs, but if the specific varnos
don't matter, why not just use the untranslated version of the targets
before entering the loop? And if the specific varnos do matter, then
presumably you need to do it every time.

This is not a full review, but I'm out of time for right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#93

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Robert Haas (#92)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 1, 2018 at 4:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:

This is not a full review, but I'm out of time for right now.

Another thing I see here now is that create_grouping_paths() and
create_child_grouping_paths() are extremely similar. Isn't there some
way we can refactor things so that we can reuse the code instead of
duplicating it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#94

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#92)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Mar 2, 2018 at 3:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 1, 2018 at 5:34 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Attached new patchset after rebasing my changes over these changes and on
latest HEAD.
+        * We have already created a Gather or Gather Merge path atop
cheapest
+        * partial path. Thus the partial path referenced by the
Gather node needs
+        * to be preserved as adding new partial paths in same rel may
delete this
+        * referenced path. To do this we need to clear the
partial_pathlist from
+        * the partially_grouped_rel as we may add partial paths again
while doing
+        * partitionwise aggregation. Keeping older partial path intact
seems
+        * reasonable too as it might possible that the final path
chosen which is
+        * using it wins, but the underneath partial path is not the
cheapest one.
This isn't a good design. You shouldn't create a Gather or Gather
Merge node until all partial paths have been added. I mean, the point
is to put a Gather node on top of the cheapest path, not the path that
is currently the cheapest but might not actually be the cheapest once
we've added them all.

To be honest, I didn't know that we should not generated Gather or Gather
Merge until we have all possible partial paths in place. I realize it
recently while debugging one issue reported by Rajkumar off-list. While
working on that fix, what I have observed is
- I have cheapest partial path with cost say 10, a Gather on it increased
cost to 11.
- Later when I add a partial path it has a cost say 9 but a gather on it
resulted is total cost to 12.
This means, the first Gather path is the cheapest one but not the
underneath partial path and unfortunately that got removed when my partial
path is added into the partial_pathlist.

Due to this, I thought it is better to have both paths valid and to avoid
deleting earlier cheapest partial_path, I chose to reset the
partially_grouped_rel->partial_pathlist.

But, yes per comment in generate_gather_paths() and as you said, we should
add Gather or Gather Merge only after we have done with all partial path
creation. Sorry for not knowing this before.

+add_gather_or_gather_merge(PlannerInfo *root,

Please stop picking generic function names for functions that have
very specific purposes. I don't really think that you need this to be
a separate function at all, but it it is certainly NOT a
general-purpose function for adding a Gather or Gather Merge node.

OK. Got it now.

+        /*
+         * Collect statistics about aggregates for estimating costs of
+         * performing aggregation in parallel or in partial, if not
already
+         * done. We use same cost for all the children as they will be
same
+         * anyways.
+         */
If it only needs to be done once, do we really have to have it inside
the loop? I see that you're using the varno-translated
partial_target->exprs and target->exprs, but if the specific varnos
don't matter, why not just use the untranslated version of the targets
before entering the loop? And if the specific varnos do matter, then
presumably you need to do it every time.

Yes. It can be pulled outside a loop.

This is not a full review, but I'm out of time for right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#95

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#93)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Sat, Mar 3, 2018 at 12:12 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 1, 2018 at 4:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:

This is not a full review, but I'm out of time for right now.

Another thing I see here now is that create_grouping_paths() and
create_child_grouping_paths() are extremely similar. Isn't there some
way we can refactor things so that we can reuse the code instead of
duplicating it?

Yes. I too observed the same after our re-design.

To avoid code duplication, I am now calling create_grouping_paths() for
child relation too.

However, to perform Gather or Gather Merge once we have all partial paths
ready, and to avoid too many existing code rearrangement, I am calling
try_partitionwise_grouping() before we do any aggregation/grouping on whole
relation. By doing this, we will be having all partial paths in
partially_grouped_rel and then existing code will do required finalization
along with any Gather or Gather Merge, if required.

Please have a look over attached patch-set and let me know if it needs
further changes.

Thanks

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#96

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#95)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

However, to perform Gather or Gather Merge once we have all partial paths
ready, and to avoid too many existing code rearrangement, I am calling
try_partitionwise_grouping() before we do any aggregation/grouping on whole
relation. By doing this, we will be having all partial paths in
partially_grouped_rel and then existing code will do required finalization
along with any Gather or Gather Merge, if required.

Please have a look over attached patch-set and let me know if it needs
further changes.

This does look better.

+ <term><varname>enable_partitionwise_agg</varname> (<type>boolean</type>)

Please don't abbreviate "aggregate" to "agg".

-       /* Build final grouping paths */
-       add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-
partially_grouped_rel, agg_costs,
-
&agg_final_costs, gd, can_sort, can_hash,
-                                                         dNumGroups,
(List *) parse->havingQual);
+       if (isPartialAgg)
+       {
+               Assert(agg_partial_costs && agg_final_costs);
+               add_paths_to_partial_grouping_rel(root, input_rel,
+
           partially_grouped_rel,
+
           agg_partial_costs,
+
           gd, can_sort, can_hash,
+
           false, true);
+       }
+       else
+       {
+               double          dNumGroups;
+
+               /* Estimate number of groups. */
+               dNumGroups = get_number_of_groups(root,
+
           cheapest_path->rows,
+
           gd,
+
           child_data ? make_tlist_from_pathtarget(target) :
parse->targetList);
+
+               /* Build final grouping paths */
+               add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+
partially_grouped_rel, agg_costs,
+
agg_final_costs, gd, can_sort, can_hash,
+
dNumGroups, (List *) havingQual);
+       }

This looks strange. Why do we go down two completely different code
paths here? It seems to me that the set of paths we add to the
partial_pathlist shouldn't depend at all on isPartialAgg. I might be
confused, but it seems to me that any aggregate path we construct that
is going to run in parallel must necessarily be partial, because even
if each group will occur only in one partition, it might still occur
in multiple workers for that partition, so finalization would be
needed. On the other hand, for non-partial paths, we can add then to
partially_grouped_rel when isPartialAgg = true and to grouped_rel when
isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
vs. AGGSPLIT_INITIAL_SERIAL. But that doesn't appear to be what this
is doing.

+       /*
+        * If there are any fully aggregated partial paths present,
may be because
+        * of parallel Append over partitionwise aggregates, we must stick a
+        * Gather or Gather Merge path atop the cheapest partial path.
+        */
+       if (grouped_rel->partial_pathlist)

This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.

More tomorrow...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#97

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#96)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 6, 2018 at 2:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

However, to perform Gather or Gather Merge once we have all partial paths
ready, and to avoid too many existing code rearrangement, I am calling
try_partitionwise_grouping() before we do any aggregation/grouping on

whole

relation. By doing this, we will be having all partial paths in
partially_grouped_rel and then existing code will do required

finalization

along with any Gather or Gather Merge, if required.

Please have a look over attached patch-set and let me know if it needs
further changes.

This does look better.

Thank you, Robert.

+ <term><varname>enable_partitionwise_agg</varname>
(<type>boolean</type>)

Please don't abbreviate "aggregate" to "agg".

This is in-lined with enable_hashagg GUC. Do you think
enable_partitionwise_aggregate
seems better? But it will be not consistent with other GUC names like
enable_hashagg then.

-       /* Build final grouping paths */
-       add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-
partially_grouped_rel, agg_costs,
-
&agg_final_costs, gd, can_sort, can_hash,
-                                                         dNumGroups,
(List *) parse->havingQual);
+       if (isPartialAgg)
+       {
+               Assert(agg_partial_costs && agg_final_costs);
+               add_paths_to_partial_grouping_rel(root, input_rel,
+
partially_grouped_rel,
+
agg_partial_costs,
+
gd, can_sort, can_hash,
+
false, true);
+       }
+       else
+       {
+               double          dNumGroups;
+
+               /* Estimate number of groups. */
+               dNumGroups = get_number_of_groups(root,
+
cheapest_path->rows,
+
gd,
+
child_data ? make_tlist_from_pathtarget(target) :
parse->targetList);
+
+               /* Build final grouping paths */
+               add_paths_to_grouping_rel(root, input_rel, grouped_rel,
target,
+
partially_grouped_rel, agg_costs,
+
agg_final_costs, gd, can_sort, can_hash,
+
dNumGroups, (List *) havingQual);
+       }

This looks strange. Why do we go down two completely different code
paths here?

It is because when isPartialAgg = true we need to create partially
aggregated non-partial paths which should be added in
partially_grouped_rel->pathlist. And when isPartialAgg = false, we are
creating fully aggregated paths which goes into grouped_rel->pathlist.

It seems to me that the set of paths we add to the
partial_pathlist shouldn't depend at all on isPartialAgg. I might be
confused, but it seems to me that any aggregate path we construct that
is going to run in parallel must necessarily be partial, because even
if each group will occur only in one partition, it might still occur
in multiple workers for that partition, so finalization would be
needed.

Thats's true. We are creating partially aggregated partial paths for this
and keeps them in partially_grouped_rel->partial_pathlist.

On the other hand, for non-partial paths, we can add then to
partially_grouped_rel when isPartialAgg = true and to grouped_rel when
isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE
vs. AGGSPLIT_INITIAL_SERIAL.

Yes. As explained above, they goes in pathlist of respective Rel.
However, PathTarget is different too, we need partial_pathtarget when
isPartialAgg = true and also need agg_partial_costs.

But that doesn't appear to be what this
is doing.

So the code for doing partially aggregated partial paths and partially
aggregated non-partial path is same except partial paths goes into
partial_pathlist where as non-partial goes into pathlist of
partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
when isPartialAgg = true seems correct. Also as we have decided, this
function is responsible to create all partially aggregated paths including
both partial and non-partial.

Am I missing something?

+       /*
+        * If there are any fully aggregated partial paths present,
may be because
+        * of parallel Append over partitionwise aggregates, we must stick
a
+        * Gather or Gather Merge path atop the cheapest partial path.
+        */
+       if (grouped_rel->partial_pathlist)

This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.

Well, these comments are not present anywhere else than this place. With
Parallel Append and Partitionwise aggregates, it is now possible to have
fully aggregated partial paths now. And thus we need to stick a Gather
and/or Gather Merge atop cheapest partial path. And I believe the code does
the same. Am I missing something?

More tomorrow...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#98

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#95)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi Jeevan,
I am back reviewing this. Here are some comments.

@@ -1415,7 +1413,8 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
          * the unparameterized Append path we are constructing for the parent.
          * If not, there's no workable unparameterized path.
          */
-        if (childrel->cheapest_total_path->param_info == NULL)
+        if (childrel->pathlist != NIL &&
+            childrel->cheapest_total_path->param_info == NULL)
             accumulate_append_subpath(childrel->cheapest_total_path,
                                       &subpaths, NULL);
         else
@@ -1683,6 +1682,13 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
             RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
             Path       *subpath;

+            if (childrel->pathlist == NIL)
+            {
+                /* failed to make a suitable path for this child */
+                subpaths_valid = false;
+                break;
+            }
+
When can childrel->pathlist be NIL?

diff --git a/src/backend/optimizer/plan/createplan.c
b/src/backend/optimizer/plan/createplan.c
index 9ae1bf3..f90626c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1670,7 +1670,15 @@ create_sort_plan(PlannerInfo *root, SortPath
*best_path, int flags)
     subplan = create_plan_recurse(root, best_path->subpath,
                                   flags | CP_SMALL_TLIST);

-    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys, NULL);
+    /*
+     * In find_ec_member_for_tle(), child EC members are ignored if they don't
+     * belong to the given relids. Thus, if this sort path is based on a child
+     * relation, we must pass the relids of it. Otherwise, we will end-up into
+     * an error requiring pathkey item.
+     */
+    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
+                                   IS_OTHER_REL(best_path->subpath->parent) ?
+                                   best_path->path.parent->relids : NULL);

copy_generic_path_info(&plan->plan, (Path *) best_path);

Please separate this small adjustment in a patch of its own, with some
explanation of why we need it i.e. now this function can see SortPaths from
child (other) relations.

+    if (child_data)
+    {
+        /* Must be other rel as all child relations are marked OTHER_RELs */
+        Assert(IS_OTHER_REL(input_rel));

I think we should check IS_OTHER_REL() and Assert(child_data). That way we know
that the code in the if block is executed for OTHER relation.

-    if ((root->hasHavingQual || parse->groupingSets) &&
+    if (!child_data && (root->hasHavingQual || parse->groupingSets) &&

Degenerate grouping will never see child relations, so instead of checking for
child_data, Assert (!IS_OTHER_REL()) inside this block. Add a comment there
explaining the assertion.

+     *
+     * If we are performing grouping for a child relation, fetch can_sort from
+     * the child_data to avoid re-calculating same.
      */
-    can_sort = (gd && gd->rollups != NIL)
-        || grouping_is_sortable(parse->groupClause);
+    can_sort = child_data ? child_data->can_sort : ((gd &&
gd->rollups != NIL) ||
+
grouping_is_sortable(parse->groupClause));

Instead of adding a conditional here, we can compute these values before
create_grouping_paths() is called from grouping_planner() and then pass them
down to try_partitionwise_grouping(). I have attached a patch which refactors
this code. Please see if this refactoring is useful. In the attached patch, I
have handled can_sort, can_hash and partial aggregation costs. More on the last
component below.

     /*
      * Figure out whether a PartialAggregate/Finalize Aggregate execution
@@ -3740,10 +3789,8 @@ create_grouping_paths(PlannerInfo *root,
      * partial paths for partially_grouped_rel; that way, later code can
      * easily consider both parallel and non-parallel approaches to grouping.
      */
-    if (try_parallel_aggregation)
+    if (!child_data && !(agg_costs->hasNonPartial || agg_costs->hasNonSerial))
     {
-        PathTarget *partial_grouping_target;
-
[... clipped ...]
+            get_agg_clause_costs(root, havingQual,
                                  AGGSPLIT_FINAL_DESERIAL,
-                                 &agg_final_costs);
+                                 agg_final_costs);
         }
+    }

With this change, we are computing partial aggregation costs even in
the cases when
those will not be used e.g. when there are no children and parallel paths can
not be created. In the attached patch, I have refactored the code such that
they are computed when they are needed the first time and re-used later.

+    if (child_data)
+    {
+        partial_grouping_target = child_data->partialRelTarget;
+        partially_grouped_rel->reltarget = partial_grouping_target;
+        agg_partial_costs = child_data->agg_partial_costs;
+        agg_final_costs = child_data->agg_final_costs;
+    }

I think, with the refactoring, we can get rid of the last two lines here. I
think we can get rid of this block entirely, but I have not reviewed the entire
code to confirm that.

 static PathTarget *
-make_partial_grouping_target(PlannerInfo *root, PathTarget *grouping_target)
+make_partial_grouping_target(PlannerInfo *root,
+                             PathTarget *grouping_target,
+                             Node *havingQual)
This looks like a refactoring change. Should go to one of the refactoring
patches or in a patch of its own.

This isn't full review. I will continue reviewing this further.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_cgp_refactor.patchtext/x-patch; charset=US-ASCII; name=pg_cgp_refactor.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index de1257d..8460134 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -109,6 +109,28 @@ typedef struct
 	int		   *tleref_to_colnum_map;
 } grouping_sets_data;
 
+/*
+ * Struct for extra information passed to create_grouping_paths
+ *
+ * can_hash is true if hash-based grouping is possible, false otherwise.
+ * can_sort is true if sort-based grouping is possible, false otherwise.
+ * can_partial_agg is true if partial aggregation is possible, false otherwise.
+ * partial_costs_set indicates whether agg_partial_costs and agg_final_costs
+ *		have valid costs set. Both of those are computed only when partial
+ *		aggregation is required.
+ * agg_partial_costs gives partial aggregation costs.
+ * agg_final_costs gives finalization costs.
+ */
+typedef struct
+{
+	bool		can_hash;
+	bool		can_sort;
+	bool		can_partial_agg;
+	bool		partial_costs_set;
+	AggClauseCosts agg_partial_costs;
+	AggClauseCosts agg_final_costs;
+} GroupPathExtraData;
+
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -138,7 +160,8 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
 					  RelOptInfo *input_rel,
 					  PathTarget *target,
 					  const AggClauseCosts *agg_costs,
-					  grouping_sets_data *gd);
+					  grouping_sets_data *gd,
+					  GroupPathExtraData *extra);
 static void consider_groupingsets_paths(PlannerInfo *root,
 							RelOptInfo *grouped_rel,
 							Path *path,
@@ -201,7 +224,11 @@ static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  bool can_sort,
 								  bool can_hash);
 static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
+				 RelOptInfo *grouped_rel, GroupPathExtraData *extra);
+static void compute_group_path_extra_data(PlannerInfo *root,
+							  GroupPathExtraData *extra,
+							  grouping_sets_data *gd,
+							  const AggClauseCosts *agg_costs);
 
 
 /*****************************************************************************
@@ -1981,11 +2008,17 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		 */
 		if (have_grouping)
 		{
+			GroupPathExtraData group_extra;
+
+			compute_group_path_extra_data(root, &group_extra, gset_data,
+										  &agg_costs);
+
 			current_rel = create_grouping_paths(root,
 												current_rel,
 												grouping_target,
 												&agg_costs,
-												gset_data);
+												gset_data,
+												&group_extra);
 			/* Fix things up if grouping_target contains SRFs */
 			if (parse->hasTargetSRFs)
 				adjust_paths_for_srfs(root, current_rel,
@@ -3595,6 +3628,67 @@ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
 }
 
 /*
+ * compute_group_path_extra_data
+ *	  Compute extra information required for grouping operation specified in
+ *	  the query.
+ */
+static void
+compute_group_path_extra_data(PlannerInfo *root, GroupPathExtraData *extra,
+							  grouping_sets_data *gd,
+							  const AggClauseCosts *agg_costs)
+{
+	Query	   *parse = root->parse;
+
+	/*
+	 * Determine whether it's possible to perform sort-based implementations
+	 * of grouping.  (Note that if groupClause is empty,
+	 * grouping_is_sortable() is trivially true, and all the
+	 * pathkeys_contained_in() tests will succeed too, so that we'll consider
+	 * every surviving input path.)
+	 *
+	 * If we have grouping sets, we might be able to sort some but not all of
+	 * them; in this case, we need can_sort to be true as long as we must
+	 * consider any sorted-input plan.
+	 */
+	extra->can_sort = (gd && gd->rollups != NIL) ||
+					  grouping_is_sortable(parse->groupClause);
+
+	/*
+	 * Determine whether we should consider hash-based implementations of
+	 * grouping.
+	 *
+	 * Hashed aggregation only applies if we're grouping. If we have grouping
+	 * sets, some groups might be hashable but others not; in this case we set
+	 * can_hash true as long as there is nothing globally preventing us from
+	 * hashing (and we should therefore consider plans with hashes).
+	 *
+	 * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+	 * aggregates.  (Doing so would imply storing *all* the input values in
+	 * the hash table, and/or running many sorts in parallel, either of which
+	 * seems like a certain loser.)  We similarly don't support ordered-set
+	 * aggregates in hashed aggregation, but that case is also included in the
+	 * numOrderedAggs count.
+	 *
+	 * Note: grouping_is_hashable() is much more expensive to check than the
+	 * other gating conditions, so we want to do it last.
+	 */
+	extra->can_hash = (parse->groupClause != NIL &&
+					   agg_costs->numOrderedAggs == 0 &&
+					   (gd ? gd->any_hashable :
+							 grouping_is_hashable(parse->groupClause)));
+
+	/*
+	 * Set partial aggregation costs if we are going to calculate partial
+	 * aggregates in create_grouping_paths().
+	 */
+	extra->partial_costs_set = false;
+
+	/* Is partial aggregation possible? */
+	extra->can_partial_agg = (agg_costs->hasNonPartial ||
+							  agg_costs->hasNonSerial);
+}
+
+/*
  * create_grouping_paths
  *
  * Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3624,17 +3718,18 @@ create_grouping_paths(PlannerInfo *root,
 					  RelOptInfo *input_rel,
 					  PathTarget *target,
 					  const AggClauseCosts *agg_costs,
-					  grouping_sets_data *gd)
+					  grouping_sets_data *gd,
+					  GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
-	AggClauseCosts agg_partial_costs;	/* parallel only */
-	AggClauseCosts agg_final_costs; /* parallel only */
+	AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
+	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 	double		dNumGroups;
-	bool		can_hash;
-	bool		can_sort;
+	bool		can_hash = extra->can_hash;
+	bool		can_sort = extra->can_sort;
 	bool		try_parallel_aggregation;
 
 	/*
@@ -3750,48 +3845,11 @@ create_grouping_paths(PlannerInfo *root,
 									  gd);
 
 	/*
-	 * Determine whether it's possible to perform sort-based implementations
-	 * of grouping.  (Note that if groupClause is empty,
-	 * grouping_is_sortable() is trivially true, and all the
-	 * pathkeys_contained_in() tests will succeed too, so that we'll consider
-	 * every surviving input path.)
-	 *
-	 * If we have grouping sets, we might be able to sort some but not all of
-	 * them; in this case, we need can_sort to be true as long as we must
-	 * consider any sorted-input plan.
-	 */
-	can_sort = (gd && gd->rollups != NIL)
-		|| grouping_is_sortable(parse->groupClause);
-
-	/*
-	 * Determine whether we should consider hash-based implementations of
-	 * grouping.
-	 *
-	 * Hashed aggregation only applies if we're grouping. If we have grouping
-	 * sets, some groups might be hashable but others not; in this case we set
-	 * can_hash true as long as there is nothing globally preventing us from
-	 * hashing (and we should therefore consider plans with hashes).
-	 *
-	 * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
-	 * aggregates.  (Doing so would imply storing *all* the input values in
-	 * the hash table, and/or running many sorts in parallel, either of which
-	 * seems like a certain loser.)  We similarly don't support ordered-set
-	 * aggregates in hashed aggregation, but that case is also included in the
-	 * numOrderedAggs count.
-	 *
-	 * Note: grouping_is_hashable() is much more expensive to check than the
-	 * other gating conditions, so we want to do it last.
-	 */
-	can_hash = (parse->groupClause != NIL &&
-				agg_costs->numOrderedAggs == 0 &&
-				(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)));
-
-	/*
 	 * Figure out whether a PartialAggregate/Finalize Aggregate execution
 	 * strategy is viable.
 	 */
 	try_parallel_aggregation = can_parallel_agg(root, input_rel, grouped_rel,
-												agg_costs);
+												extra);
 
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
@@ -3812,38 +3870,45 @@ create_grouping_paths(PlannerInfo *root,
 		partial_grouping_target = make_partial_grouping_target(root, target);
 		partially_grouped_rel->reltarget = partial_grouping_target;
 
-		/*
-		 * Collect statistics about aggregates for estimating costs of
-		 * performing aggregation in parallel.
-		 */
-		MemSet(&agg_partial_costs, 0, sizeof(AggClauseCosts));
-		MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
-		if (parse->hasAggs)
+		/* Set partial aggregation costs, if not already computed. */
+		if (!extra->partial_costs_set)
 		{
-			/* partial phase */
-			get_agg_clause_costs(root, (Node *) partial_grouping_target->exprs,
-								 AGGSPLIT_INITIAL_SERIAL,
-								 &agg_partial_costs);
-
-			/* final phase */
-			get_agg_clause_costs(root, (Node *) target->exprs,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
-			get_agg_clause_costs(root, parse->havingQual,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
+			/*
+			 * Collect statistics about aggregates for estimating costs of
+			 * performing aggregation in parallel.
+			 */
+			MemSet(agg_partial_costs, 0, sizeof(AggClauseCosts));
+			MemSet(agg_final_costs, 0, sizeof(AggClauseCosts));
+			if (parse->hasAggs)
+			{
+				/* partial phase */
+				get_agg_clause_costs(root,
+									 (Node *) partial_grouping_target->exprs,
+									 AGGSPLIT_INITIAL_SERIAL,
+									 agg_partial_costs);
+
+				/* final phase */
+				get_agg_clause_costs(root, (Node *) target->exprs,
+									 AGGSPLIT_FINAL_DESERIAL,
+									 agg_final_costs);
+				get_agg_clause_costs(root, parse->havingQual,
+									 AGGSPLIT_FINAL_DESERIAL,
+									 agg_final_costs);
+			}
+
+			extra->partial_costs_set = true;
 		}
 
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel,
-										  &agg_partial_costs,
+										  agg_partial_costs,
 										  gd, can_sort, can_hash);
 	}
 
 	/* Build final grouping paths */
 	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
 							  partially_grouped_rel, agg_costs,
-							  &agg_final_costs, gd, can_sort, can_hash,
+							  agg_final_costs, gd, can_sort, can_hash,
 							  dNumGroups, (List *) parse->havingQual);
 
 	/* Give a helpful error if we failed to find any implementation */
@@ -6380,7 +6445,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
  */
 static bool
 can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs)
+				 RelOptInfo *grouped_rel, GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 
@@ -6407,7 +6472,7 @@ can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 		/* We don't know how to do grouping sets in parallel. */
 		return false;
 	}
-	else if (agg_costs->hasNonPartial || agg_costs->hasNonSerial)
+	else if (extra->can_partial_agg)
 	{
 		/* Insufficient support for partial mode. */
 		return false;

#99

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#98)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 6, 2018 at 4:59 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Hi Jeevan,
I am back reviewing this. Here are some comments.

@@ -1415,7 +1413,8 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
* the unparameterized Append path we are constructing for the
parent.
* If not, there's no workable unparameterized path.
*/
-        if (childrel->cheapest_total_path->param_info == NULL)
+        if (childrel->pathlist != NIL &&
+            childrel->cheapest_total_path->param_info == NULL)
accumulate_append_subpath(childrel->cheapest_total_path,
&subpaths, NULL);
else
@@ -1683,6 +1682,13 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
Path       *subpath;

+            if (childrel->pathlist == NIL)
+            {
+                /* failed to make a suitable path for this child */
+                subpaths_valid = false;
+                break;
+            }
+
When can childrel->pathlist be NIL?

diff --git a/src/backend/optimizer/plan/createplan.c
b/src/backend/optimizer/plan/createplan.c
index 9ae1bf3..f90626c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1670,7 +1670,15 @@ create_sort_plan(PlannerInfo *root, SortPath
*best_path, int flags)
subplan = create_plan_recurse(root, best_path->subpath,
flags | CP_SMALL_TLIST);

-    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
NULL);
+    /*
+     * In find_ec_member_for_tle(), child EC members are ignored if they
don't
+     * belong to the given relids. Thus, if this sort path is based on a
child
+     * relation, we must pass the relids of it. Otherwise, we will end-up
into
+     * an error requiring pathkey item.
+     */
+    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
+                                   IS_OTHER_REL(best_path->subpath->parent)
?
+                                   best_path->path.parent->relids : NULL);

copy_generic_path_info(&plan->plan, (Path *) best_path);

Please separate this small adjustment in a patch of its own, with some
explanation of why we need it i.e. now this function can see SortPaths from
child (other) relations.

+    if (child_data)
+    {
+        /* Must be other rel as all child relations are marked OTHER_RELs
*/
+        Assert(IS_OTHER_REL(input_rel));

I think we should check IS_OTHER_REL() and Assert(child_data). That way we
know
that the code in the if block is executed for OTHER relation.

-    if ((root->hasHavingQual || parse->groupingSets) &&
+    if (!child_data && (root->hasHavingQual || parse->groupingSets) &&

Degenerate grouping will never see child relations, so instead of checking
for
child_data, Assert (!IS_OTHER_REL()) inside this block. Add a comment there
explaining the assertion.

+     *
+     * If we are performing grouping for a child relation, fetch can_sort
from
+     * the child_data to avoid re-calculating same.
*/
-    can_sort = (gd && gd->rollups != NIL)
-        || grouping_is_sortable(parse->groupClause);
+    can_sort = child_data ? child_data->can_sort : ((gd &&
gd->rollups != NIL) ||
+
grouping_is_sortable(parse->groupClause));

Instead of adding a conditional here, we can compute these values before
create_grouping_paths() is called from grouping_planner() and then pass
them
down to try_partitionwise_grouping(). I have attached a patch which
refactors
this code. Please see if this refactoring is useful. In the attached
patch, I
have handled can_sort, can_hash and partial aggregation costs. More on the
last
component below.

Changes look good to me and refactoring will be useful for partitionwise
patches.

However, will it be good if we add agg_costs into the GroupPathExtraData
too?
Also can we pass this to the add_partial_paths_to_grouping_rel() and
add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs
related details individually to them?

/*
* Figure out whether a PartialAggregate/Finalize Aggregate execution
@@ -3740,10 +3789,8 @@ create_grouping_paths(PlannerInfo *root,
* partial paths for partially_grouped_rel; that way, later code can
* easily consider both parallel and non-parallel approaches to
grouping.
*/
-    if (try_parallel_aggregation)
+    if (!child_data && !(agg_costs->hasNonPartial ||
agg_costs->hasNonSerial))
{
-        PathTarget *partial_grouping_target;
-
[... clipped ...]
+            get_agg_clause_costs(root, havingQual,
AGGSPLIT_FINAL_DESERIAL,
-                                 &agg_final_costs);
+                                 agg_final_costs);
}
+    }
With this change, we are computing partial aggregation costs even in
the cases when
those will not be used e.g. when there are no children and parallel paths
can
not be created. In the attached patch, I have refactored the code such that
they are computed when they are needed the first time and re-used later.
+    if (child_data)
+    {
+        partial_grouping_target = child_data->partialRelTarget;
+        partially_grouped_rel->reltarget = partial_grouping_target;
+        agg_partial_costs = child_data->agg_partial_costs;
+        agg_final_costs = child_data->agg_final_costs;
+    }
I think, with the refactoring, we can get rid of the last two lines here. I
think we can get rid of this block entirely, but I have not reviewed the
entire
code to confirm that.
static PathTarget *
-make_partial_grouping_target(PlannerInfo *root, PathTarget
*grouping_target)
+make_partial_grouping_target(PlannerInfo *root,
+                             PathTarget *grouping_target,
+                             Node *havingQual)
This looks like a refactoring change. Should go to one of the refactoring
patches or in a patch of its own.
This isn't full review. I will continue reviewing this further.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#100

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#97)

3 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 6, 2018 at 5:31 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

This is in-lined with enable_hashagg GUC. Do you think
enable_partitionwise_aggregate seems better? But it will be not consistent
with other GUC names like enable_hashagg then.

Well, if I had my way, enable_hashagg would be spelled
enable_hash_aggregate, too, but I wasn't involved in the project that
long ago. 100% consistency is hard to achieve here; the perfect
parallel of enable_hashagg would be enable_partitionwiseagg, but then
it would be inconsistent with enable_partitionwise_join unless we
renamed it to enable_partitionwisejoin, which I would rather not do.
I think the way the enable_blahfoo names were done was kinda
shortsighted -- it works OK as long as blahfoo is pretty short, like
mergejoin or hashagg or whatever, but if you have more or longer words
then I think it's hard to see where the word boundaries are without
any punctuation. And if you start abbreviating then you end up with
things like enable_pwagg which are not very easy to understand. So I
favor spelling everything out and punctuating it.

So the code for doing partially aggregated partial paths and partially
aggregated non-partial path is same except partial paths goes into
partial_pathlist where as non-partial goes into pathlist of
partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
when isPartialAgg = true seems correct. Also as we have decided, this
function is responsible to create all partially aggregated paths including
both partial and non-partial.

Am I missing something?

Hmm. I guess not. I think I didn't read this code well enough
previously. Please find attached proposed incremental patches (0001
and 0002) which hopefully make the code in this area a bit clearer.

+       /*
+        * If there are any fully aggregated partial paths present,
may be because
+        * of parallel Append over partitionwise aggregates, we must stick
a
+        * Gather or Gather Merge path atop the cheapest partial path.
+        */
+       if (grouped_rel->partial_pathlist)
This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.
Well, these comments are not present anywhere else than this place. With
Parallel Append and Partitionwise aggregates, it is now possible to have
fully aggregated partial paths now. And thus we need to stick a Gather
and/or Gather Merge atop cheapest partial path. And I believe the code does
the same. Am I missing something?

I misread the code. Sigh. I should have waited until today to send
that email and taken time to study it more carefully. But I still
don't think it's completely correct. It will not consider using a
pre-sorted path; the only strategies it can consider are cheapest path
+ Gather and cheapest path + explicit Sort (even if the cheapest path
is already correctly sorted!) + Gather Merge. It should really do
something similar to what add_paths_to_partial_grouping_rel() already
does: first call generate_gather_paths() and then, if the cheapest
partial path is not already correctly sorted, also try an explicit
Sort + Gather Merge. In fact, it looks like we can actually reuse
that logic exactly. See attached 0003 incremental patch; this changes
the outputs of one of your regression tests, but the new plan looks
better.

Some other notes:

There's a difference between performing partial aggregation in the
same process and performing it in a different process. hasNonPartial
tells us that we can't perform partial aggregation *at all*;
hasNonSerial only tells us that partial and final aggregation must
happen in the same process. This patch could possibly take advantage
of partial aggregation even when hasNonSerial is set. Finalize
Aggregate -> Append -> N copies of { Partial Aggregate -> Whatever }
is OK with hasNonSerial = true as long as hasNonPartial = false. Now,
the bad news is that for this to actually work we'd need to define new
values of AggSplit, like AGGSPLIT_INITIAL = AGGSPLITOP_SKIPFINAL and
AGGSPLIT_FINAL = AGGSPLITOP_COMBINE, and I'm not sure how much
complexity that adds. However, if we're not going to do that, I think
we'd better at last add some comments about it suggesting that someone
might want to do something about it in the future.

I think that, in general, it's a good idea to keep the number of times
that create_grouping_paths() does something which is conditional on
whether child_data is NULL to a minimum. I haven't looked at what
Ashutosh tried to do there so I don't know whether it's good or bad,
but I like the idea, if we can do it cleanly.

It strikes me that we might want to consider refactoring things so
that create_grouping_paths() takes the grouping_rel and
partial_grouping_rel as input arguments. Right now, the
initialization of the child grouping and partial-grouping rels is
partly in try_partitionwise_aggregate(), which considers marking one
of them (but never both?) as dummy rels and create_grouping_paths()
which sets reloptkind, serverid, userid, etc. The logic of all of
this is a little unclear to me. Presumably, if the input rel is
dummy, then both the grouping_rel and the partial_grouping_rel are
also dummy. Also, presumably we should set the reloptkind correctly
as soon as we create the rel, not at some later stage.

Or maybe what we should do is split create_grouping_paths() into two
functions. Like this:

if (child_data)
{
partial_grouping_target = child_data->partialRelTarget;
partially_grouped_rel->reltarget = partial_grouping_target;
agg_partial_costs = child_data->agg_partial_costs;
agg_final_costs = child_data->agg_final_costs;
}

--- SPLIT IT HERE ---

/* Apply partitionwise aggregation technique, if possible. */
try_partitionwise_grouping(root, input_rel, grouped_rel,
partially_grouped_rel, target,
partial_grouping_target, agg_costs,
agg_partial_costs, agg_final_costs, gd,
can_sort, can_hash, havingQual, isPartialAgg);

It seems to me that everything from that point to the end is doing the
path generation and it's all pretty much the same for the parent and
child cases. But everything before that is either stuff that doesn't
apply to the child case at all (like the degenerate grouping case) or
stuff that should be done once and passed down (like
can_sort/can_hash). The only exception I see is some of the stuff
that sets up the upper rel at the top of the function, but maybe that
logic could be refactored into a separate function as well (like
initialize_grouping_rel). Then, instead of try_partitionwise_join()
actually calling create_grouping_paths(), it would call
initialize_grouping_rel() and then the path-adding function that we
split off from the bottom of the current create_grouping_paths(),
basically skipping all that stuff in the middle that we don't really
want to do in that case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0003-Refactor-code-to-add-Gather-Gather-Merge.patchapplication/octet-stream; name=0003-Refactor-code-to-add-Gather-Gather-Merge.patchDownload

From 67fe6d448323fe3b1d7c85dc14393b912b628127 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 6 Mar 2018 14:13:00 -0500
Subject: [PATCH 3/3] Refactor code to add Gather/Gather Merge.

---
 src/backend/optimizer/plan/planner.c        | 74 +++++++++++------------------
 src/test/regress/expected/partition_agg.out | 12 ++---
 2 files changed, 34 insertions(+), 52 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 934c9d322c..b87cba1f62 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -205,6 +205,7 @@ static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  bool can_hash,
 								  bool use_partial_pathlist,
 								  bool need_partial_agg);
+static void gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel);
 static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
 static void apply_scanjoin_target_to_paths(PlannerInfo *root, RelOptInfo *rel,
@@ -6245,39 +6246,13 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	}
 
 	/*
-	 * If there are any fully aggregated partial paths present, may be because
-	 * of parallel Append over partitionwise aggregates, we must stick a
-	 * Gather or Gather Merge path atop the cheapest partial path.
+	 * When partitionwise aggregate is used, we might have fully aggregated
+	 * paths in the partial pathlist, because add_paths_to_append_rel() will
+	 * consider a path for grouped_rel consisting of a Parallel Append of
+	 * non-partial paths from each child.
 	 */
-	if (grouped_rel->partial_pathlist)
-	{
-		Path	   *apath;
-		double		total_groups;
-
-		apath = (Path *) linitial(grouped_rel->partial_pathlist);
-		Assert(apath->parallel_workers > 0);
-		total_groups = apath->rows * apath->parallel_workers;
-
-		add_path(grouped_rel, (Path *)
-				 create_gather_path(root, grouped_rel, apath,
-									apath->pathtarget, NULL, &total_groups));
-
-		/*
-		 * Sorting the cheapest path to match the group keys and then applying
-		 * a Gather Merge node to the result might be a winning strategy.
-		 */
-		if (root->group_pathkeys)
-		{
-			apath = (Path *) create_sort_path(root, grouped_rel, apath,
-											  root->group_pathkeys, -1.0);
-
-			add_path(grouped_rel, (Path *)
-					 create_gather_merge_path(root, grouped_rel, apath,
-											  apath->pathtarget,
-											  root->group_pathkeys,
-											  NULL, &total_groups));
-		}
-	}
+	if (grouped_rel->partial_pathlist != NIL)
+		gather_grouping_paths(root, grouped_rel);
 }
 
 /*
@@ -6444,14 +6419,24 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 	if (need_partial_agg)
 		return;
 
-	/*
-	 * Try adding Gather or Gather Merge to partial paths to produce
-	 * non-partial paths.
-	 */
-	generate_gather_paths(root, partially_grouped_rel, true);
+	gather_grouping_paths(root, partially_grouped_rel);
+
+	set_cheapest(partially_grouped_rel);
+}
+
+/*
+ * Try adding Gather or Gather Merge to partial paths to produce non-partial
+ * paths.
+ */
+static void
+gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+	Path	   *cheapest_path;
+
+	generate_gather_paths(root, rel, true);
 
-	/* Get cheapest partial path from partially_grouped_rel */
-	cheapest_path = linitial(partially_grouped_rel->partial_pathlist);
+	/* Get cheapest partial path from rel */
+	cheapest_path = linitial(rel->partial_pathlist);
 
 	/*
 	 * generate_gather_paths won't consider sorting the cheapest path to match
@@ -6464,23 +6449,20 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 		double		total_groups;
 
 		total_groups = cheapest_path->rows * cheapest_path->parallel_workers;
-		path = (Path *) create_sort_path(root, partially_grouped_rel,
+		path = (Path *) create_sort_path(root, rel,
 										 cheapest_path, root->group_pathkeys,
 										 -1.0);
 		path = (Path *)
 			create_gather_merge_path(root,
-									 partially_grouped_rel,
+									 rel,
 									 path,
-									 partially_grouped_rel->reltarget,
+									 rel->reltarget,
 									 root->group_pathkeys,
 									 NULL,
 									 &total_groups);
 
-		add_path(partially_grouped_rel, path);
+		add_path(rel, path);
 	}
-
-	/* Now choose the best path(s) */
-	set_cheapest(partially_grouped_rel);
 }
 
 /*
diff --git a/src/test/regress/expected/partition_agg.out b/src/test/regress/expected/partition_agg.out
index a7fd6c1a64..612de4aded 100644
--- a/src/test/regress/expected/partition_agg.out
+++ b/src/test/regress/expected/partition_agg.out
@@ -934,12 +934,12 @@ ANALYZE pagg_tab;
 -- is not partial agg safe.
 EXPLAIN (COSTS OFF)
 SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 3 ORDER BY 1, 2, 3;
-                                          QUERY PLAN                                           
------------------------------------------------------------------------------------------------
- Sort
-   Sort Key: pagg_tab_p2_s1.a, (sum(pagg_tab_p2_s1.b)), (array_agg(DISTINCT pagg_tab_p2_s1.c))
-   ->  Gather
-         Workers Planned: 2
+                                             QUERY PLAN                                              
+-----------------------------------------------------------------------------------------------------
+ Gather Merge
+   Workers Planned: 2
+   ->  Sort
+         Sort Key: pagg_tab_p2_s1.a, (sum(pagg_tab_p2_s1.b)), (array_agg(DISTINCT pagg_tab_p2_s1.c))
          ->  Parallel Append
                ->  GroupAggregate
                      Group Key: pagg_tab_p2_s1.a
-- 
2.14.3 (Apple Git-98)

0002-Rejigger-test-for-grouped_rel-pathlist-NIL.patchapplication/octet-stream; name=0002-Rejigger-test-for-grouped_rel-pathlist-NIL.patchDownload

From 90dbc5670600679e81e0d38f7a624dd3e96ceec6 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 6 Mar 2018 13:26:35 -0500
Subject: [PATCH 2/3] Rejigger test for grouped_rel->pathlist == NIL.

The previous coding made this depend on child_data, but it seems to me that
it actually depends on isPartialAgg.  If isPartialAgg = false, then we
should have created at least one path for grouped_rel; otherwise, we will
not have done so.
---
 src/backend/optimizer/plan/planner.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a3c4e106ef..934c9d322c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3878,24 +3878,13 @@ create_grouping_paths(PlannerInfo *root,
 								  partially_grouped_rel, agg_costs,
 								  agg_final_costs, gd, can_sort, can_hash,
 								  dNumGroups, (List *) havingQual);
-	}
 
-	/* Give a helpful error if we failed to find any implementation */
-	if (!child_data && grouped_rel->pathlist == NIL)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("could not implement GROUP BY"),
-				 errdetail("Some of the datatypes only support hashing, while others only support sorting.")));
-	else if (child_data)
-	{
-		/*
-		 * Must have a path created above.  If path is present for the whole
-		 * relation, then it should also present for the child relation.  And
-		 * if not, we would have thrown an error already and thus will never
-		 * end up here.
-		 */
-		Assert(grouped_rel->pathlist != NIL ||
-			   partially_grouped_rel->pathlist != NIL);
+		/* Give a helpful error if we failed to find any implementation */
+		if (grouped_rel->pathlist == NIL)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("could not implement GROUP BY"),
+					 errdetail("Some of the datatypes only support hashing, while others only support sorting.")));
 	}
 
 	/*
-- 
2.14.3 (Apple Git-98)

0001-Tidy-up-calls-to-add_paths_to_partial_grouping_rel.patchapplication/octet-stream; name=0001-Tidy-up-calls-to-add_paths_to_partial_grouping_rel.patchDownload

From 0909f45df69206af26d87d3d2b92eb1e3866743f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 6 Mar 2018 13:18:08 -0500
Subject: [PATCH 1/3] Tidy up calls to add_paths_to_partial_grouping_rel.

- Move Assert() from callers into add_paths_to_partial_grouping_rel.
- Don't bother to Assert() anything about agg_final_costs; it's not passed anyway.
- Add some comments.
---
 src/backend/optimizer/plan/planner.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e9b60584bb..a3c4e106ef 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3839,25 +3839,30 @@ create_grouping_paths(PlannerInfo *root,
 							   agg_partial_costs, agg_final_costs, gd,
 							   can_sort, can_hash, havingQual, isPartialAgg);
 
+	/*
+	 * Try parallel aggregation, if possible.  This produces only partially
+	 * grouped paths, since the same group could be produced by more than one
+	 * worker.
+	 */
 	if (try_parallel_aggregation)
-	{
-		Assert(agg_partial_costs && agg_final_costs);
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel,
 										  agg_partial_costs,
 										  gd, can_sort, can_hash,
 										  true, isPartialAgg);
-	}
 
+	/*
+	 * Now generate non-partial paths.  When isPartialAgg = true, we're
+	 * generating paths for a child rel whose partition keys are not contained
+	 * in the grouping keys, so we can only generate partially grouped paths.
+	 * Otherwise, we can do complete grouping.
+	 */
 	if (isPartialAgg)
-	{
-		Assert(agg_partial_costs && agg_final_costs);
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel,
 										  agg_partial_costs,
 										  gd, can_sort, can_hash,
 										  false, true);
-	}
 	else
 	{
 		double		dNumGroups;
@@ -6320,6 +6325,8 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 	double		dNumPartialGroups = 0;
 	ListCell   *lc;
 
+	Assert(agg_partial_costs != NULL);
+
 	/* Get either total or partial cheapest path */
 	cheapest_path = use_partial_pathlist ? linitial(input_rel->partial_pathlist) :
 		input_rel->cheapest_total_path;
-- 
2.14.3 (Apple Git-98)

#101

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#99)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 6, 2018 at 7:52 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Changes look good to me and refactoring will be useful for partitionwise
patches.

However, will it be good if we add agg_costs into the GroupPathExtraData
too?
Also can we pass this to the add_partial_paths_to_grouping_rel() and
add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs
related details individually to them?

I think so too.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#102

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#101)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 7, 2018 at 10:04 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Tue, Mar 6, 2018 at 7:52 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Changes look good to me and refactoring will be useful for partitionwise
patches.

However, will it be good if we add agg_costs into the GroupPathExtraData
too?
Also can we pass this to the add_partial_paths_to_grouping_rel() and
add_paths_to_grouping_rel() to avoid passing can_sort, can_hash and costs
related details individually to them?

I think so too.

Here's patch doing that. agg_costs is calculated way before we
populate other members of GroupPathExtraData, which means that we
either set the pointer to agg_costs in GroupPathExtraData or memcpy
its contents. The first option will make GroupPathExtraData asymmetric
about the costs it holds, some as pointers and some as whole
structure.Holding whole structures allows us to compute those anywhere
without worrying about memory allocation or variable life time. So, I
am reluctant to make all costs as pointers. So, I have not added
agg_costs to GroupPathExtraData yet.

We could make GroupPathExtraData as a variable in grouping_planner()
and populate its members as we progress. But I think that's digression
from the original purpose of the patch.

I observe that we are computing agg_costs, number of groups etc. again
in postgres_fdw so there seems to be a merit in passing those values
as GroupPathExtraData to FDW as well like what you have done with
OtherUpperExtraData. But we will come to that once we have
straightened the partition-wise aggregate patches.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_cgp_refactor.patchtext/x-patch; charset=US-ASCII; name=pg_cgp_refactor.patchDownload

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index de1257d..d47dc7e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -109,6 +109,28 @@ typedef struct
 	int		   *tleref_to_colnum_map;
 } grouping_sets_data;
 
+/*
+ * Struct for extra information passed to create_grouping_paths
+ *
+ * can_hash is true if hash-based grouping is possible, false otherwise.
+ * can_sort is true if sort-based grouping is possible, false otherwise.
+ * can_partial_agg is true if partial aggregation is possible, false otherwise.
+ * partial_costs_set indicates whether agg_partial_costs and agg_final_costs
+ *		have valid costs set. Both of those are computed only when partial
+ *		aggregation is required.
+ * agg_partial_costs gives partial aggregation costs.
+ * agg_final_costs gives finalization costs.
+ */
+typedef struct
+{
+	bool		can_hash;
+	bool		can_sort;
+	bool		can_partial_agg;
+	bool		partial_costs_set;
+	AggClauseCosts agg_partial_costs;
+	AggClauseCosts agg_final_costs;
+} GroupPathExtraData;
+
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -138,7 +160,8 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
 					  RelOptInfo *input_rel,
 					  PathTarget *target,
 					  const AggClauseCosts *agg_costs,
-					  grouping_sets_data *gd);
+					  grouping_sets_data *gd,
+					  GroupPathExtraData *extra);
 static void consider_groupingsets_paths(PlannerInfo *root,
 							RelOptInfo *grouped_rel,
 							Path *path,
@@ -190,18 +213,20 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  PathTarget *target,
 						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
-						  const AggClauseCosts *agg_final_costs,
-						  grouping_sets_data *gd, bool can_sort, bool can_hash,
+						  grouping_sets_data *gd,
+						  GroupPathExtraData *extra,
 						  double dNumGroups, List *havingQual);
 static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
 								  RelOptInfo *partially_grouped_rel,
-								  AggClauseCosts *agg_partial_costs,
 								  grouping_sets_data *gd,
-								  bool can_sort,
-								  bool can_hash);
+								  GroupPathExtraData *extra);
 static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
+				 RelOptInfo *grouped_rel, GroupPathExtraData *extra);
+static void compute_group_path_extra_data(PlannerInfo *root,
+							  GroupPathExtraData *extra,
+							  grouping_sets_data *gd,
+							  const AggClauseCosts *agg_costs);
 
 
 /*****************************************************************************
@@ -1981,11 +2006,17 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		 */
 		if (have_grouping)
 		{
+			GroupPathExtraData group_extra;
+
+			compute_group_path_extra_data(root, &group_extra, gset_data,
+										  &agg_costs);
+
 			current_rel = create_grouping_paths(root,
 												current_rel,
 												grouping_target,
 												&agg_costs,
-												gset_data);
+												gset_data,
+												&group_extra);
 			/* Fix things up if grouping_target contains SRFs */
 			if (parse->hasTargetSRFs)
 				adjust_paths_for_srfs(root, current_rel,
@@ -3595,6 +3626,67 @@ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
 }
 
 /*
+ * compute_group_path_extra_data
+ *	  Compute extra information required for grouping operation specified in
+ *	  the query.
+ */
+static void
+compute_group_path_extra_data(PlannerInfo *root, GroupPathExtraData *extra,
+							  grouping_sets_data *gd,
+							  const AggClauseCosts *agg_costs)
+{
+	Query	   *parse = root->parse;
+
+	/*
+	 * Determine whether it's possible to perform sort-based implementations
+	 * of grouping.  (Note that if groupClause is empty,
+	 * grouping_is_sortable() is trivially true, and all the
+	 * pathkeys_contained_in() tests will succeed too, so that we'll consider
+	 * every surviving input path.)
+	 *
+	 * If we have grouping sets, we might be able to sort some but not all of
+	 * them; in this case, we need can_sort to be true as long as we must
+	 * consider any sorted-input plan.
+	 */
+	extra->can_sort = (gd && gd->rollups != NIL) ||
+					  grouping_is_sortable(parse->groupClause);
+
+	/*
+	 * Determine whether we should consider hash-based implementations of
+	 * grouping.
+	 *
+	 * Hashed aggregation only applies if we're grouping. If we have grouping
+	 * sets, some groups might be hashable but others not; in this case we set
+	 * can_hash true as long as there is nothing globally preventing us from
+	 * hashing (and we should therefore consider plans with hashes).
+	 *
+	 * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
+	 * aggregates.  (Doing so would imply storing *all* the input values in
+	 * the hash table, and/or running many sorts in parallel, either of which
+	 * seems like a certain loser.)  We similarly don't support ordered-set
+	 * aggregates in hashed aggregation, but that case is also included in the
+	 * numOrderedAggs count.
+	 *
+	 * Note: grouping_is_hashable() is much more expensive to check than the
+	 * other gating conditions, so we want to do it last.
+	 */
+	extra->can_hash = (parse->groupClause != NIL &&
+					   agg_costs->numOrderedAggs == 0 &&
+					   (gd ? gd->any_hashable :
+							 grouping_is_hashable(parse->groupClause)));
+
+	/*
+	 * Set partial aggregation costs if we are going to calculate partial
+	 * aggregates in create_grouping_paths().
+	 */
+	extra->partial_costs_set = false;
+
+	/* Is partial aggregation possible? */
+	extra->can_partial_agg = (agg_costs->hasNonPartial ||
+							  agg_costs->hasNonSerial);
+}
+
+/*
  * create_grouping_paths
  *
  * Build a new upperrel containing Paths for grouping and/or aggregation.
@@ -3624,17 +3716,16 @@ create_grouping_paths(PlannerInfo *root,
 					  RelOptInfo *input_rel,
 					  PathTarget *target,
 					  const AggClauseCosts *agg_costs,
-					  grouping_sets_data *gd)
+					  grouping_sets_data *gd,
+					  GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *grouped_rel;
 	RelOptInfo *partially_grouped_rel;
-	AggClauseCosts agg_partial_costs;	/* parallel only */
-	AggClauseCosts agg_final_costs; /* parallel only */
+	AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
+	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 	double		dNumGroups;
-	bool		can_hash;
-	bool		can_sort;
 	bool		try_parallel_aggregation;
 
 	/*
@@ -3750,48 +3841,11 @@ create_grouping_paths(PlannerInfo *root,
 									  gd);
 
 	/*
-	 * Determine whether it's possible to perform sort-based implementations
-	 * of grouping.  (Note that if groupClause is empty,
-	 * grouping_is_sortable() is trivially true, and all the
-	 * pathkeys_contained_in() tests will succeed too, so that we'll consider
-	 * every surviving input path.)
-	 *
-	 * If we have grouping sets, we might be able to sort some but not all of
-	 * them; in this case, we need can_sort to be true as long as we must
-	 * consider any sorted-input plan.
-	 */
-	can_sort = (gd && gd->rollups != NIL)
-		|| grouping_is_sortable(parse->groupClause);
-
-	/*
-	 * Determine whether we should consider hash-based implementations of
-	 * grouping.
-	 *
-	 * Hashed aggregation only applies if we're grouping. If we have grouping
-	 * sets, some groups might be hashable but others not; in this case we set
-	 * can_hash true as long as there is nothing globally preventing us from
-	 * hashing (and we should therefore consider plans with hashes).
-	 *
-	 * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
-	 * aggregates.  (Doing so would imply storing *all* the input values in
-	 * the hash table, and/or running many sorts in parallel, either of which
-	 * seems like a certain loser.)  We similarly don't support ordered-set
-	 * aggregates in hashed aggregation, but that case is also included in the
-	 * numOrderedAggs count.
-	 *
-	 * Note: grouping_is_hashable() is much more expensive to check than the
-	 * other gating conditions, so we want to do it last.
-	 */
-	can_hash = (parse->groupClause != NIL &&
-				agg_costs->numOrderedAggs == 0 &&
-				(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)));
-
-	/*
 	 * Figure out whether a PartialAggregate/Finalize Aggregate execution
 	 * strategy is viable.
 	 */
 	try_parallel_aggregation = can_parallel_agg(root, input_rel, grouped_rel,
-												agg_costs);
+												extra);
 
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
@@ -3812,39 +3866,45 @@ create_grouping_paths(PlannerInfo *root,
 		partial_grouping_target = make_partial_grouping_target(root, target);
 		partially_grouped_rel->reltarget = partial_grouping_target;
 
-		/*
-		 * Collect statistics about aggregates for estimating costs of
-		 * performing aggregation in parallel.
-		 */
-		MemSet(&agg_partial_costs, 0, sizeof(AggClauseCosts));
-		MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
-		if (parse->hasAggs)
+		/* Set partial aggregation costs, if not already computed. */
+		if (!extra->partial_costs_set)
 		{
-			/* partial phase */
-			get_agg_clause_costs(root, (Node *) partial_grouping_target->exprs,
-								 AGGSPLIT_INITIAL_SERIAL,
-								 &agg_partial_costs);
-
-			/* final phase */
-			get_agg_clause_costs(root, (Node *) target->exprs,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
-			get_agg_clause_costs(root, parse->havingQual,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
+			/*
+			 * Collect statistics about aggregates for estimating costs of
+			 * performing aggregation in parallel.
+			 */
+			MemSet(agg_partial_costs, 0, sizeof(AggClauseCosts));
+			MemSet(agg_final_costs, 0, sizeof(AggClauseCosts));
+			if (parse->hasAggs)
+			{
+				/* partial phase */
+				get_agg_clause_costs(root,
+									 (Node *) partial_grouping_target->exprs,
+									 AGGSPLIT_INITIAL_SERIAL,
+									 agg_partial_costs);
+
+				/* final phase */
+				get_agg_clause_costs(root, (Node *) target->exprs,
+									 AGGSPLIT_FINAL_DESERIAL,
+									 agg_final_costs);
+				get_agg_clause_costs(root, parse->havingQual,
+									 AGGSPLIT_FINAL_DESERIAL,
+									 agg_final_costs);
+			}
+
+			extra->partial_costs_set = true;
 		}
 
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel,
-										  &agg_partial_costs,
-										  gd, can_sort, can_hash);
+										  gd, extra);
 	}
 
 	/* Build final grouping paths */
 	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
 							  partially_grouped_rel, agg_costs,
-							  &agg_final_costs, gd, can_sort, can_hash,
-							  dNumGroups, (List *) parse->havingQual);
+							  gd, extra, dNumGroups,
+							  (List *) parse->havingQual);
 
 	/* Give a helpful error if we failed to find any implementation */
 	if (grouped_rel->pathlist == NIL)
@@ -6006,13 +6066,16 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  PathTarget *target,
 						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
-						  const AggClauseCosts *agg_final_costs,
-						  grouping_sets_data *gd, bool can_sort, bool can_hash,
+						  grouping_sets_data *gd,
+						  GroupPathExtraData *extra,
 						  double dNumGroups, List *havingQual)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	ListCell   *lc;
+	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+	bool		can_sort = extra->can_sort;
+	bool		can_hash = extra->can_hash;
 
 	if (can_sort)
 	{
@@ -6219,16 +6282,18 @@ static void
 add_paths_to_partial_grouping_rel(PlannerInfo *root,
 								  RelOptInfo *input_rel,
 								  RelOptInfo *partially_grouped_rel,
-								  AggClauseCosts *agg_partial_costs,
 								  grouping_sets_data *gd,
-								  bool can_sort,
-								  bool can_hash)
+								  GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_partial_path = linitial(input_rel->partial_pathlist);
 	Size		hashaggtablesize;
 	double		dNumPartialGroups = 0;
 	ListCell   *lc;
+	AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
+	bool		can_sort = extra->can_sort;
+	bool		can_hash = extra->can_hash;
+
 
 	/* Estimate number of partial groups. */
 	dNumPartialGroups = get_number_of_groups(root,
@@ -6380,7 +6445,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
  */
 static bool
 can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs)
+				 RelOptInfo *grouped_rel, GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 
@@ -6407,7 +6472,7 @@ can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 		/* We don't know how to do grouping sets in parallel. */
 		return false;
 	}
-	else if (agg_costs->hasNonPartial || agg_costs->hasNonSerial)
+	else if (extra->can_partial_agg)
 	{
 		/* Insufficient support for partial mode. */
 		return false;

#103

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#98)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 6, 2018 at 4:59 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

Hi Jeevan,
I am back reviewing this. Here are some comments.

@@ -1415,7 +1413,8 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
* the unparameterized Append path we are constructing for the
parent.
* If not, there's no workable unparameterized path.
*/
-        if (childrel->cheapest_total_path->param_info == NULL)
+        if (childrel->pathlist != NIL &&
+            childrel->cheapest_total_path->param_info == NULL)
accumulate_append_subpath(childrel->cheapest_total_path,
&subpaths, NULL);
else
@@ -1683,6 +1682,13 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
Path       *subpath;

+            if (childrel->pathlist == NIL)
+            {
+                /* failed to make a suitable path for this child */
+                subpaths_valid = false;
+                break;
+            }
+
When can childrel->pathlist be NIL?

Done. Sorry it was leftover from my earlier trial. Not needed now. Removed.

diff --git a/src/backend/optimizer/plan/createplan.c
b/src/backend/optimizer/plan/createplan.c
index 9ae1bf3..f90626c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1670,7 +1670,15 @@ create_sort_plan(PlannerInfo *root, SortPath
*best_path, int flags)
subplan = create_plan_recurse(root, best_path->subpath,
flags | CP_SMALL_TLIST);

-    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
NULL);
+    /*
+     * In find_ec_member_for_tle(), child EC members are ignored if they
don't
+     * belong to the given relids. Thus, if this sort path is based on a
child
+     * relation, we must pass the relids of it. Otherwise, we will end-up
into
+     * an error requiring pathkey item.
+     */
+    plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
+                                   IS_OTHER_REL(best_path->subpath->parent)
?
+                                   best_path->path.parent->relids : NULL);

copy_generic_path_info(&plan->plan, (Path *) best_path);

Please separate this small adjustment in a patch of its own, with some
explanation of why we need it i.e. now this function can see SortPaths from
child (other) relations.

I am not sure whether it is good to split this out of the main patch. Main
patch exposes this requirement and thus seems better to have these changes
in main patch itself.
However, I have no issues in extracting it into a separate small patch. Let
me know your views.

+    if (child_data)
+    {
+        /* Must be other rel as all child relations are marked OTHER_RELs
*/
+        Assert(IS_OTHER_REL(input_rel));
I think we should check IS_OTHER_REL() and Assert(child_data). That way we
know
that the code in the if block is executed for OTHER relation.

Done.

-    if ((root->hasHavingQual || parse->groupingSets) &&
+    if (!child_data && (root->hasHavingQual || parse->groupingSets) &&
Degenerate grouping will never see child relations, so instead of checking
for
child_data, Assert (!IS_OTHER_REL()) inside this block. Add a comment there
explaining the assertion.

Done.

+     *
+     * If we are performing grouping for a child relation, fetch can_sort
from
+     * the child_data to avoid re-calculating same.
*/
-    can_sort = (gd && gd->rollups != NIL)
-        || grouping_is_sortable(parse->groupClause);
+    can_sort = child_data ? child_data->can_sort : ((gd &&
gd->rollups != NIL) ||
+
grouping_is_sortable(parse->groupClause));
Instead of adding a conditional here, we can compute these values before
create_grouping_paths() is called from grouping_planner() and then pass
them
down to try_partitionwise_grouping(). I have attached a patch which
refactors
this code. Please see if this refactoring is useful. In the attached
patch, I
have handled can_sort, can_hash and partial aggregation costs. More on the
last
component below.
/*
* Figure out whether a PartialAggregate/Finalize Aggregate execution
@@ -3740,10 +3789,8 @@ create_grouping_paths(PlannerInfo *root,
* partial paths for partially_grouped_rel; that way, later code can
* easily consider both parallel and non-parallel approaches to
grouping.
*/
-    if (try_parallel_aggregation)
+    if (!child_data && !(agg_costs->hasNonPartial ||
agg_costs->hasNonSerial))
{
-        PathTarget *partial_grouping_target;
-
[... clipped ...]
+            get_agg_clause_costs(root, havingQual,
AGGSPLIT_FINAL_DESERIAL,
-                                 &agg_final_costs);
+                                 agg_final_costs);
}
+    }
With this change, we are computing partial aggregation costs even in
the cases when
those will not be used e.g. when there are no children and parallel paths
can
not be created. In the attached patch, I have refactored the code such that
they are computed when they are needed the first time and re-used later.
+    if (child_data)
+    {
+        partial_grouping_target = child_data->partialRelTarget;
+        partially_grouped_rel->reltarget = partial_grouping_target;
+        agg_partial_costs = child_data->agg_partial_costs;
+        agg_final_costs = child_data->agg_final_costs;
+    }
I think, with the refactoring, we can get rid of the last two lines here. I
think we can get rid of this block entirely, but I have not reviewed the
entire
code to confirm that.

I have added your patch as one of the refactoring patch and rebased my
changes over it.
Yes, it removed this block and other few conditions too.

static PathTarget *
-make_partial_grouping_target(PlannerInfo *root, PathTarget
*grouping_target)
+make_partial_grouping_target(PlannerInfo *root,
+                             PathTarget *grouping_target,
+                             Node *havingQual)
This looks like a refactoring change. Should go to one of the refactoring
patches or in a patch of its own.

OK. Refactored into separate patch.

Will post a new patchset with these changes included.

This isn't full review. I will continue reviewing this further.

Sure.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#104

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#100)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 7, 2018 at 1:45 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 6, 2018 at 5:31 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

This is in-lined with enable_hashagg GUC. Do you think
enable_partitionwise_aggregate seems better? But it will be not

consistent

with other GUC names like enable_hashagg then.

Well, if I had my way, enable_hashagg would be spelled
enable_hash_aggregate, too, but I wasn't involved in the project that
long ago. 100% consistency is hard to achieve here; the perfect
parallel of enable_hashagg would be enable_partitionwiseagg, but then
it would be inconsistent with enable_partitionwise_join unless we
renamed it to enable_partitionwisejoin, which I would rather not do.
I think the way the enable_blahfoo names were done was kinda
shortsighted -- it works OK as long as blahfoo is pretty short, like
mergejoin or hashagg or whatever, but if you have more or longer words
then I think it's hard to see where the word boundaries are without
any punctuation. And if you start abbreviating then you end up with
things like enable_pwagg which are not very easy to understand. So I
favor spelling everything out and punctuating it.

Understood and make sense.
Updated.

So the code for doing partially aggregated partial paths and partially
aggregated non-partial path is same except partial paths goes into
partial_pathlist where as non-partial goes into pathlist of
partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel()
when isPartialAgg = true seems correct. Also as we have decided, this
function is responsible to create all partially aggregated paths

including

both partial and non-partial.

Am I missing something?

Hmm. I guess not. I think I didn't read this code well enough
previously. Please find attached proposed incremental patches (0001
and 0002) which hopefully make the code in this area a bit clearer.

Yep. Thanks for these patches.
I have merged these changes into my main (0007) patch now.

+       /*
+        * If there are any fully aggregated partial paths present,
may be because
+        * of parallel Append over partitionwise aggregates, we must
stick
a
+        * Gather or Gather Merge path atop the cheapest partial path.
+        */
+       if (grouped_rel->partial_pathlist)
This comment is copied from someplace where the code does what the
comment says, but here it doesn't do any such thing.
Well, these comments are not present anywhere else than this place. With
Parallel Append and Partitionwise aggregates, it is now possible to have
fully aggregated partial paths now. And thus we need to stick a Gather
and/or Gather Merge atop cheapest partial path. And I believe the code
does

the same. Am I missing something?

I misread the code. Sigh. I should have waited until today to send
that email and taken time to study it more carefully. But I still
don't think it's completely correct. It will not consider using a
pre-sorted path; the only strategies it can consider are cheapest path
+ Gather and cheapest path + explicit Sort (even if the cheapest path
is already correctly sorted!) + Gather Merge. It should really do
something similar to what add_paths_to_partial_grouping_rel() already
does: first call generate_gather_paths() and then, if the cheapest
partial path is not already correctly sorted, also try an explicit
Sort + Gather Merge. In fact, it looks like we can actually reuse
that logic exactly. See attached 0003 incremental patch; this changes
the outputs of one of your regression tests, but the new plan looks
better.

This seems like a refactoring patch and thus added as separate patch (0005)
in patch-set.
Changes related to PWA patch are merged accordingly too.

Attached new patch-set with these changes merged and fixing review comments
from Ashutosh Bapat along with his GroupPathExtraData changes patch.

Some other notes:

There's a difference between performing partial aggregation in the
same process and performing it in a different process. hasNonPartial
tells us that we can't perform partial aggregation *at all*;
hasNonSerial only tells us that partial and final aggregation must
happen in the same process. This patch could possibly take advantage
of partial aggregation even when hasNonSerial is set. Finalize
Aggregate -> Append -> N copies of { Partial Aggregate -> Whatever }
is OK with hasNonSerial = true as long as hasNonPartial = false. Now,
the bad news is that for this to actually work we'd need to define new
values of AggSplit, like AGGSPLIT_INITIAL = AGGSPLITOP_SKIPFINAL and
AGGSPLIT_FINAL = AGGSPLITOP_COMBINE, and I'm not sure how much
complexity that adds. However, if we're not going to do that, I think
we'd better at last add some comments about it suggesting that someone
might want to do something about it in the future.

Am I not familier with these much. So will add a comment as you said.

I think that, in general, it's a good idea to keep the number of times
that create_grouping_paths() does something which is conditional on
whether child_data is NULL to a minimum. I haven't looked at what
Ashutosh tried to do there so I don't know whether it's good or bad,
but I like the idea, if we can do it cleanly.

It strikes me that we might want to consider refactoring things so
that create_grouping_paths() takes the grouping_rel and
partial_grouping_rel as input arguments. Right now, the
initialization of the child grouping and partial-grouping rels is
partly in try_partitionwise_aggregate(), which considers marking one
of them (but never both?) as dummy rels and create_grouping_paths()
which sets reloptkind, serverid, userid, etc. The logic of all of
this is a little unclear to me. Presumably, if the input rel is
dummy, then both the grouping_rel and the partial_grouping_rel are
also dummy. Also, presumably we should set the reloptkind correctly
as soon as we create the rel, not at some later stage.

Or maybe what we should do is split create_grouping_paths() into two
functions. Like this:

if (child_data)
{
partial_grouping_target = child_data->partialRelTarget;
partially_grouped_rel->reltarget = partial_grouping_target;
agg_partial_costs = child_data->agg_partial_costs;
agg_final_costs = child_data->agg_final_costs;
}
--- SPLIT IT HERE ---
/* Apply partitionwise aggregation technique, if possible. */
try_partitionwise_grouping(root, input_rel, grouped_rel,
partially_grouped_rel, target,
partial_grouping_target, agg_costs,
agg_partial_costs, agg_final_costs, gd,
can_sort, can_hash, havingQual,
isPartialAgg);

It seems to me that everything from that point to the end is doing the
path generation and it's all pretty much the same for the parent and
child cases. But everything before that is either stuff that doesn't
apply to the child case at all (like the degenerate grouping case) or
stuff that should be done once and passed down (like
can_sort/can_hash). The only exception I see is some of the stuff
that sets up the upper rel at the top of the function, but maybe that
logic could be refactored into a separate function as well (like
initialize_grouping_rel). Then, instead of try_partitionwise_join()
actually calling create_grouping_paths(), it would call
initialize_grouping_rel() and then the path-adding function that we
split off from the bottom of the current create_grouping_paths(),
basically skipping all that stuff in the middle that we don't really
want to do in that case.

I will have a look over this proposal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#105

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#104)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 7, 2018 at 8:07 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
Here are some more review comments esp. on
try_partitionwise_grouping() function. BTW name of that function
doesn't go in sync with enable_partitionwise_aggregation (which btw is
in sync with enable_fooagg GUCs). But it goes in sync with
create_grouping_paths(). It looks like we have already confused
aggregation and grouping e.g. enable_hashagg may affect path creation
when there is no aggregation involved i.e. only grouping is involved
but create_grouping_paths will create paths when there is no grouping
involved. Generally it looks like the function names use grouping to
mean both aggregation and grouping but GUCs use aggregation to mean
both of those. So, the naming in this patch looks consistent with the
current naming conventions.

+        grouped_rel->part_scheme = input_rel->part_scheme;
+        grouped_rel->nparts = nparts;
+        grouped_rel->boundinfo = input_rel->boundinfo;
+        grouped_rel->part_rels = part_rels;

You need to set the part_exprs which will provide partition keys for this
partitioned relation. I think, we should include all the part_exprs of
input_rel which are part of GROUP BY clause. Since any other expressions in
part_exprs are not part of GROUP BY clause, they can not appear in the
targetlist without an aggregate on top. So they can't be part of the partition
keys of the grouped relation.

In create_grouping_paths() we fetch both partial as well as fully grouped rel
for given input relation. But in case of partial aggregation, we don't need
fully grouped rel since we are not computing full aggregates for the children.
Since fetch_upper_rel() creates a relation when one doesn't exist, we are
unnecessarily creating fully grouped rels in this case. For thousands of
partitions that's a lot of memory wasted.

I see a similar issue with create_grouping_paths() when we are computing only
full aggregates (either because partial aggregation is not possible or because
parallelism is not possible). In that case, we unconditionally create partially
grouped rels. That too would waste a lot of memory.

I think that partially_grouped_rel, when required, is partitioned irrespective
of whether we do full aggregation per partition or not. So, if we have its
part_rels and other partitioning properties set. I agree that right now we
won't use this information anywhere. It may be useful, in case we device a way
to use partially_grouped_rel directly without using grouped_rel for planning
beyond grouping, which seems unlikely.

+
+        /*
+         * Parallel aggregation requires partial target, so compute it here
+         * and translate all vars. For partial aggregation, we need it
+         * anyways.
+         */
+        partial_target = make_partial_grouping_target(root, target,
+                                                      havingQual);

Don't we have this available in partially_grouped_rel?

That shows one asymmetry that Robert's refactoring has introduced. We don't set
reltarget of grouped_rel but set reltarget of partially_grouped_rel. If
reltarget of grouped_rel changes across paths (the reason why we have not set
it in grouped_rel), shouldn't reltarget of partially grouped paths change
accordingly?

+
+/*
+ * group_by_has_partkey
+ *
+ * Returns true, if all the partition keys of the given relation are part of
+ * the GROUP BY clauses, false otherwise.
+ */
+static bool
+group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target,
+                     List *groupClause)

We could modify this function to return the list of part_exprs which are part
of group clause and use that as the partition keys of the grouped_rel if
required. If group by doesn't have all the partition keys, the function would
return a NULL list.

Right now, in case of full aggregation, partially_grouped_rel is populated with
the partial paths created by adding partial aggregation to the partial paths of
input relation. But we are not trying to create partial paths by (parallel)
appending the (non)partial paths from the child partially_grouped_rel. Have we
thought about that? Would such paths have different shapes from the ones that
we create now and will they be better?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#106

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#105)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 8, 2018 at 2:45 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

+        grouped_rel->part_scheme = input_rel->part_scheme;
+        grouped_rel->nparts = nparts;
+        grouped_rel->boundinfo = input_rel->boundinfo;
+        grouped_rel->part_rels = part_rels;
You need to set the part_exprs which will provide partition keys for this
partitioned relation. I think, we should include all the part_exprs of
input_rel which are part of GROUP BY clause. Since any other expressions in
part_exprs are not part of GROUP BY clause, they can not appear in the
targetlist without an aggregate on top. So they can't be part of the partition
keys of the grouped relation.

In create_grouping_paths() we fetch both partial as well as fully grouped rel
for given input relation. But in case of partial aggregation, we don't need
fully grouped rel since we are not computing full aggregates for the children.
Since fetch_upper_rel() creates a relation when one doesn't exist, we are
unnecessarily creating fully grouped rels in this case. For thousands of
partitions that's a lot of memory wasted.

I see a similar issue with create_grouping_paths() when we are computing only
full aggregates (either because partial aggregation is not possible or because
parallelism is not possible). In that case, we unconditionally create partially
grouped rels. That too would waste a lot of memory.

This kind of goes along with the suggestion I made yesterday to
introduce a new function, which at the time I proposed calling
initialize_grouping_rel(), to set up new grouped or partially grouped
relations. By doing that it would be easier to ensure the
initialization is always done in a consistent way but only for the
relations we actually need. But maybe we should call it
fetch_grouping_rel() instead. The idea would be that instead of
calling fetch_upper_rel() we would call fetch_grouping_rel() when it
is a question of the grouped or partially grouped relation. It would
either return the existing relation or initialize a new one for us. I
think that would make it fairly easy to initialize only the ones we're
going to need.

Also, I don't think we should be paranoid about memory usage here.
It's good to avoid creating new rels that are obviously not needed,
not only because of memory consumption but also because of the CPU
consumption involved, but I don't want to contort the code to squeeze
every last byte of memory out of this.

On a related note, I'm not sure that this code is correct:

+       if (!isPartialAgg)
+       {
+               grouped_rel->part_scheme = input_rel->part_scheme;
+               grouped_rel->nparts = nparts;
+               grouped_rel->boundinfo = input_rel->boundinfo;
+               grouped_rel->part_rels = part_rels;
+       }

It's not obvious to me why this should be done only when
!isPartialAgg. The comments claim that the partially grouped child
rels can't be considered partitions of the top-level partitially
grouped rel, but it seems to me that we could consider them that way.
Maybe I'm missing something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#107

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#105)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 8, 2018 at 1:15 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Wed, Mar 7, 2018 at 8:07 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
Here are some more review comments esp. on
try_partitionwise_grouping() function. BTW name of that function
doesn't go in sync with enable_partitionwise_aggregation (which btw is
in sync with enable_fooagg GUCs). But it goes in sync with
create_grouping_paths(). It looks like we have already confused
aggregation and grouping e.g. enable_hashagg may affect path creation
when there is no aggregation involved i.e. only grouping is involved
but create_grouping_paths will create paths when there is no grouping
involved. Generally it looks like the function names use grouping to
mean both aggregation and grouping but GUCs use aggregation to mean
both of those. So, the naming in this patch looks consistent with the
current naming conventions.
+        grouped_rel->part_scheme = input_rel->part_scheme;
+        grouped_rel->nparts = nparts;
+        grouped_rel->boundinfo = input_rel->boundinfo;
+        grouped_rel->part_rels = part_rels;
You need to set the part_exprs which will provide partition keys for this
partitioned relation. I think, we should include all the part_exprs of
input_rel which are part of GROUP BY clause. Since any other expressions in
part_exprs are not part of GROUP BY clause, they can not appear in the
targetlist without an aggregate on top. So they can't be part of the
partition
keys of the grouped relation.

In create_grouping_paths() we fetch both partial as well as fully grouped
rel
for given input relation. But in case of partial aggregation, we don't need
fully grouped rel since we are not computing full aggregates for the
children.
Since fetch_upper_rel() creates a relation when one doesn't exist, we are
unnecessarily creating fully grouped rels in this case. For thousands of
partitions that's a lot of memory wasted.

I see a similar issue with create_grouping_paths() when we are computing
only
full aggregates (either because partial aggregation is not possible or
because
parallelism is not possible). In that case, we unconditionally create
partially
grouped rels. That too would waste a lot of memory.

I think that partially_grouped_rel, when required, is partitioned
irrespective
of whether we do full aggregation per partition or not. So, if we have its
part_rels and other partitioning properties set. I agree that right now we
won't use this information anywhere. It may be useful, in case we device a
way
to use partially_grouped_rel directly without using grouped_rel for
planning
beyond grouping, which seems unlikely.
+
+        /*
+         * Parallel aggregation requires partial target, so compute it
here
+         * and translate all vars. For partial aggregation, we need it
+         * anyways.
+         */
+        partial_target = make_partial_grouping_target(root, target,
+                                                      havingQual);
Don't we have this available in partially_grouped_rel?

That shows one asymmetry that Robert's refactoring has introduced. We
don't set
reltarget of grouped_rel but set reltarget of partially_grouped_rel. If
reltarget of grouped_rel changes across paths (the reason why we have not
set
it in grouped_rel), shouldn't reltarget of partially grouped paths change
accordingly?

I am not sure why we don't set reltarget into the grouped_rel too.

But if we do so like we did in partially_grouped_rel, then it will be lot
easier for partitionwise aggregate as then we don't have to pass target to
functions creating paths like create_append_path. We now need to update
generate_gather_paths() to take target too as it is now being called on
grouped_rel in which reltarget is not set.

But yes, if there is any specific reason we can't do so, then I think the
same like Ashutosh Said. I didn't aware of such reason though.

+
+/*
+ * group_by_has_partkey
+ *
+ * Returns true, if all the partition keys of the given relation are part
of
+ * the GROUP BY clauses, false otherwise.
+ */
+static bool
+group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target,
+                     List *groupClause)
We could modify this function to return the list of part_exprs which are
part
of group clause and use that as the partition keys of the grouped_rel if
required. If group by doesn't have all the partition keys, the function
would
return a NULL list.

Right now, in case of full aggregation, partially_grouped_rel is populated
with
the partial paths created by adding partial aggregation to the partial
paths of
input relation. But we are not trying to create partial paths by (parallel)
appending the (non)partial paths from the child partially_grouped_rel.
Have we
thought about that? Would such paths have different shapes from the ones
that
we create now and will they be better?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#108

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#107)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 8, 2018 at 9:15 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I am not sure why we don't set reltarget into the grouped_rel too.

But if we do so like we did in partially_grouped_rel, then it will be lot
easier for partitionwise aggregate as then we don't have to pass target to
functions creating paths like create_append_path. We now need to update
generate_gather_paths() to take target too as it is now being called on
grouped_rel in which reltarget is not set.

But yes, if there is any specific reason we can't do so, then I think the
same like Ashutosh Said. I didn't aware of such reason though.

I see no problem with setting reltarget for the grouped_rel. Before
we added partially_grouped_rel, that rel computed paths with two
different targets: partial paths had the partial grouping target, and
non-partial paths had the ordinary grouping target. But that's fixed
now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#109

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#108)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 8, 2018 at 7:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 8, 2018 at 9:15 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I am not sure why we don't set reltarget into the grouped_rel too.

But if we do so like we did in partially_grouped_rel, then it will be lot
easier for partitionwise aggregate as then we don't have to pass target

to

functions creating paths like create_append_path. We now need to update
generate_gather_paths() to take target too as it is now being called on
grouped_rel in which reltarget is not set.

But yes, if there is any specific reason we can't do so, then I think the
same like Ashutosh Said. I didn't aware of such reason though.

I see no problem with setting reltarget for the grouped_rel. Before
we added partially_grouped_rel, that rel computed paths with two
different targets: partial paths had the partial grouping target, and
non-partial paths had the ordinary grouping target. But that's fixed
now.

OK.
Will update my changes accordingly.
If we set reltarget into the grouped_rel now, then I don't need one of the
refactoring patch which is passing target to the path creation functions.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#110

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#109)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 8, 2018 at 8:02 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

On Thu, Mar 8, 2018 at 7:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 8, 2018 at 9:15 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I am not sure why we don't set reltarget into the grouped_rel too.

But if we do so like we did in partially_grouped_rel, then it will be
lot
easier for partitionwise aggregate as then we don't have to pass target
to
functions creating paths like create_append_path. We now need to update
generate_gather_paths() to take target too as it is now being called on
grouped_rel in which reltarget is not set.

But yes, if there is any specific reason we can't do so, then I think
the
same like Ashutosh Said. I didn't aware of such reason though.

I see no problem with setting reltarget for the grouped_rel. Before
we added partially_grouped_rel, that rel computed paths with two
different targets: partial paths had the partial grouping target, and
non-partial paths had the ordinary grouping target. But that's fixed
now.

OK.
Will update my changes accordingly.
If we set reltarget into the grouped_rel now, then I don't need one of the
refactoring patch which is passing target to the path creation functions.

For some reason we do not set reltarget of any of the upper relations.
I don't know why, neither browsing through the comments in
grouping_planner(), including the one below before the code that
creates an array of upper relation targets.
/*
* Save the various upper-rel PathTargets we just computed into
* root->upper_targets[]. The core code doesn't use this, but it
* provides a convenient place for extensions to get at the info. For
* consistency, we save all the intermediate targets, even though some
* of the corresponding upperrels might not be needed for this query.
*/
Why don't we just set those in the corresponding RelOptInfos? May be
we should do that for all the upper rels and not just grouping_rel.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#111

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#106)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:

This kind of goes along with the suggestion I made yesterday to
introduce a new function, which at the time I proposed calling
initialize_grouping_rel(), to set up new grouped or partially grouped
relations. By doing that it would be easier to ensure the
initialization is always done in a consistent way but only for the
relations we actually need. But maybe we should call it
fetch_grouping_rel() instead. The idea would be that instead of
calling fetch_upper_rel() we would call fetch_grouping_rel() when it
is a question of the grouped or partially grouped relation. It would
either return the existing relation or initialize a new one for us. I
think that would make it fairly easy to initialize only the ones we're
going to need.

Hmm. I am working on refactoring the code to do something like this.

On a related note, I'm not sure that this code is correct:
+       if (!isPartialAgg)
+       {
+               grouped_rel->part_scheme = input_rel->part_scheme;
+               grouped_rel->nparts = nparts;
+               grouped_rel->boundinfo = input_rel->boundinfo;
+               grouped_rel->part_rels = part_rels;
+       }
It's not obvious to me why this should be done only when
!isPartialAgg. The comments claim that the partially grouped child
rels can't be considered partitions of the top-level partitially
grouped rel, but it seems to me that we could consider them that way.
Maybe I'm missing something.

When we are performing partial aggregates, GROUP clause does not have
partition keys. This means that the targetlist of the grouped relation
and partially grouped relation do not have bare partition keys. So,
for a relation sitting on top of this (partially) grouped relation the
partition key doesn't exist. So, we can't consider grouped or
partially grouped relation as partitioned.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#112

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#111)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:

This kind of goes along with the suggestion I made yesterday to
introduce a new function, which at the time I proposed calling
initialize_grouping_rel(), to set up new grouped or partially grouped
relations. By doing that it would be easier to ensure the
initialization is always done in a consistent way but only for the
relations we actually need. But maybe we should call it
fetch_grouping_rel() instead. The idea would be that instead of
calling fetch_upper_rel() we would call fetch_grouping_rel() when it
is a question of the grouped or partially grouped relation. It would
either return the existing relation or initialize a new one for us. I
think that would make it fairly easy to initialize only the ones we're
going to need.

Hmm. I am working on refactoring the code to do something like this.

Here's patch doing the same. I have split create_grouping_paths() into
three functions 1. to handle degenerate grouping paths
(try_degenerate_grouping_paths()) 2. to create the grouping rels,
partial grouped rel and grouped rel (make_grouping_rels()), which also
sets some properties in GroupPathExtraData. 3. populate grouping rels
with paths (populate_grouping_rels_with_paths()). With those changes,
I have been able to get rid of partially grouped rels when they are
not necessary. But I haven't tried to get rid of grouped_rels when
they are not needed.

GroupPathExtraData now completely absorbs members from and replaces
OtherUpperPathExtraData. This means that we have to consider a way to
pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
haven't tried it in this patch.

With this patch there's a failure in partition_aggregation where the
patch is creating paths with MergeAppend with GatherMerge underneath.
I think this is related to the call
add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
true. But I didn't investigate it further.

With those two things remaining I am posting this patch, so that
Jeevan Chalke can merge this patch into his patches and also merge
some of his changes related to mine and Robert's changes. Let me know
if this refactoring looks good.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

cgp_split.patchtext/x-patch; charset=US-ASCII; name=cgp_split.patchDownload

commit c882a54585b93b8976d2d9a25f55d18ed27d69c1
Author: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Date:   Thu Mar 8 15:33:51 2018 +0530

    Split create_grouping_paths()
    
    Separate code in create_grouping_paths() into two functions: first to
    create the grouping relations, partial grouped rel and grouped rel,
    second to populate those with paths. These two functions are then
    called from create_grouping_paths() and try_partitionwise_grouping().
    
    As part of this separate degenerate grouping case into a function of
    its own (try_degenerate_grouping_paths()) to be called only from
    create_grouping_paths().
    
    Ashutosh Bapat.

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a5a049f..1611975 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -123,12 +123,27 @@ typedef struct
  */
 typedef struct
 {
+	/* Data which remains constant once set. */
 	bool		can_hash;
 	bool		can_sort;
 	bool		can_partial_agg;
 	bool		partial_costs_set;
 	AggClauseCosts agg_partial_costs;
 	AggClauseCosts agg_final_costs;
+
+	/*
+	 * Data which depends upon the input and grouping relations and hence may
+	 * change across partitioning hierarchy
+	 */
+	bool		consider_parallel;	/* It probably remains same across
+									 * partitioning hierarchy. But double
+									 * check.
+									 */
+	bool		try_parallel_aggregation;
+	bool		partitionwise_grouping;
+	bool		partial_partitionwise_grouping;
+	Node	   *havingQual;
+	PathTarget *target;
 } GroupPathExtraData;
 
 /* Local functions */
@@ -161,9 +176,7 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
 					  RelOptInfo *input_rel,
 					  PathTarget *target,
 					  const AggClauseCosts *agg_costs,
-					  grouping_sets_data *gd,
-					  GroupPathExtraData *extra,
-					  OtherUpperPathExtraData *child_data);
+					  grouping_sets_data *gd);
 static void consider_groupingsets_paths(PlannerInfo *root,
 							RelOptInfo *grouped_rel,
 							Path *path,
@@ -240,14 +253,31 @@ static void try_partitionwise_grouping(PlannerInfo *root,
 						   RelOptInfo *input_rel,
 						   RelOptInfo *grouped_rel,
 						   RelOptInfo *partially_grouped_rel,
-						   PathTarget *target,
 						   const AggClauseCosts *agg_costs,
 						   grouping_sets_data *gd,
-						   GroupPathExtraData *extra,
-						   Node *havingQual,
-						   bool forcePartialAgg);
+						   GroupPathExtraData *extra);
 static bool group_by_has_partkey(RelOptInfo *input_rel, PathTarget *target,
 					 List *groupClause);
+static RelOptInfo *try_degenerate_grouping_paths(PlannerInfo *root,
+							  RelOptInfo *input_rel, PathTarget *target);
+static bool can_partitionwise_grouping(PlannerInfo *root, RelOptInfo *input_rel,
+						   PathTarget *target, bool force_partial_agg,
+						   GroupPathExtraData *extra, grouping_sets_data *gd,
+						   bool *is_partial_agg);
+static RelOptInfo *make_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
+							  bool consider_parallel, bool is_partial);
+static void populate_grouping_rels_with_paths(PlannerInfo *root,
+								  RelOptInfo *input_rel,
+								  RelOptInfo *grouped_rel,
+								  RelOptInfo *partially_grouped_rel,
+								  grouping_sets_data *gd,
+								  const AggClauseCosts *agg_costs,
+								  bool is_partial_agg,
+								  GroupPathExtraData *extra);
+static void make_grouping_rels(PlannerInfo *root, RelOptInfo *input_rel,
+				   bool force_partial_agg, GroupPathExtraData *extra,
+				   grouping_sets_data *gd, RelOptInfo **grouped_rel,
+				   RelOptInfo **partially_grouped_rel);
 
 
 /*****************************************************************************
@@ -1961,18 +1991,11 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		 */
 		if (have_grouping)
 		{
-			GroupPathExtraData group_extra;
-
-			compute_group_path_extra_data(root, &group_extra, gset_data,
-										  &agg_costs);
-
 			current_rel = create_grouping_paths(root,
 												current_rel,
 												grouping_target,
 												&agg_costs,
-												gset_data,
-												&group_extra,
-												NULL);
+												gset_data);
 			/* Fix things up if grouping_target contains SRFs */
 			if (parse->hasTargetSRFs)
 				adjust_paths_for_srfs(root, current_rel,
@@ -3641,76 +3664,124 @@ compute_group_path_extra_data(PlannerInfo *root, GroupPathExtraData *extra,
 	extra->partial_costs_set = false;
 
 	/* Is partial aggregation possible? */
-	extra->can_partial_agg = (agg_costs->hasNonPartial ||
-							  agg_costs->hasNonSerial);
+	extra->can_partial_agg = (!agg_costs->hasNonPartial &&
+							  !agg_costs->hasNonSerial);
 }
 
 /*
- * create_grouping_paths
- *
- * Build a new upperrel containing Paths for grouping and/or aggregation.
- * Along the way, we also build an upperrel for Paths which are partially
- * grouped and/or aggregated.  A partially grouped and/or aggregated path
- * needs a FinalizeAggregate node to complete the aggregation.  Currently,
- * the only partially grouped paths we build are also partial paths; that
- * is, they need a Gather and then a FinalizeAggregate.
- *
- * input_rel: contains the source-data Paths
- * target: the pathtarget for the result Paths to compute
- * agg_costs: cost info about all aggregates in query (in AGGSPLIT_SIMPLE mode)
- * rollup_lists: list of grouping sets, or NIL if not doing grouping sets
- * rollup_groupclauses: list of grouping clauses for grouping sets,
- *		or NIL if not doing grouping sets
- *
- * Note: all Paths in input_rel are expected to return the target computed
- * by make_group_input_target.
- *
- * We need to consider sorted and hashed aggregation in the same function,
- * because otherwise (1) it would be harder to throw an appropriate error
- * message if neither way works, and (2) we should not allow hashtable size
- * considerations to dissuade us from using hashing if sorting is not possible.
+ * try_degenerate_grouping_paths
+ *	   Create paths for degenerate grouping case when the query has a HAVING
+ *	   qual and/or grouping sets, but no aggregates and no GROUP BY (which
+ *	   implies that the grouping sets are all empty).
  *
- * If input rel itself is "other" relation, then we are creating grouping paths
- * for the child relation. Thus child_data should provide child specifc details
- * like havingQual, whether it should compute partial aggregation or not etc.
+ * This is a degenerate case in which we are supposed to emit either zero or
+ * one row for each grouping set depending on whether HAVING succeeds.
+ * Furthermore, there cannot be any variables in either HAVING or the
+ * targetlist, so we actually do not need the FROM table at all!	We can just
+ * throw away the plan-so-far and generate a Result node.  This is a
+ * sufficiently unusual corner case that it's not worth contorting the
+ * structure of this module to avoid having to generate the earlier paths in
+ * the first place.
  */
 static RelOptInfo *
-create_grouping_paths(PlannerInfo *root,
-					  RelOptInfo *input_rel,
-					  PathTarget *target,
-					  const AggClauseCosts *agg_costs,
-					  grouping_sets_data *gd,
-					  GroupPathExtraData *extra,
-					  OtherUpperPathExtraData *child_data)
+try_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
+							  PathTarget *target)
 {
+	int			nrows;
+	Path	   *path;
 	Query	   *parse = root->parse;
-	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *grouped_rel;
-	RelOptInfo *partially_grouped_rel;
-	AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
-	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
-	bool		try_parallel_aggregation;
-	PathTarget *partial_grouping_target = NULL;
-	Node	   *havingQual = child_data ? child_data->havingQual : parse->havingQual;
-	bool		isPartialAgg = child_data ? child_data->isPartialAgg : false;
 
-	if (IS_OTHER_REL(input_rel))
+	/*
+	 * Rule out non-degenerate case first. This is exactly negation of the
+	 * condition specified in the prologue of this function.
+	 */
+	if (((!root->hasHavingQual && !parse->groupingSets) ||
+		  parse->hasAggs || parse->groupClause != NIL))
+		return NULL;
+
+	/*
+	 * None of the paths created below, benefit from distributing the
+	 * computation across the partitions. Hence degenerate grouping is never
+	 * performed partitionwise.
+	 */
+	Assert(!IS_OTHER_REL(input_rel));
+
+	grouped_rel = make_grouping_rel(root, input_rel, false, false);
+
+	nrows = list_length(parse->groupingSets);
+	if (nrows > 1)
 	{
-		/* For other rel i.e. child rel, we must have child_data */
-		Assert(child_data);
+		/*
+		 * Doesn't seem worthwhile writing code to cons up a
+		 * generate_series or a values scan to emit multiple rows. Instead
+		 * just make N clones and append them.  (With a volatile HAVING
+		 * clause, this means you might get between 0 and N output rows.
+		 * Offhand I think that's desired.)
+		 */
+		List	   *paths = NIL;
+
+		while (--nrows >= 0)
+		{
+			path = (Path *)
+				create_result_path(root, grouped_rel,
+								   target,
+								   (List *) parse->havingQual);
+			paths = lappend(paths, path);
+		}
+		path = (Path *)
+			create_append_path(grouped_rel,
+							   paths,
+							   NIL,
+							   target,
+							   NULL,
+							   0,
+							   false,
+							   NIL,
+							   -1);
+		path->pathtarget = target;
+	}
+	else
+	{
+		/* No grouping sets, or just one, so one output row */
+		path = (Path *)
+			create_result_path(root, grouped_rel,
+							   target,
+							   (List *) parse->havingQual);
+	}
 
+	add_path(grouped_rel, path);
+
+	/* No need to consider any other alternatives. */
+	set_cheapest(grouped_rel);
+
+	return grouped_rel;
+}
+
+static RelOptInfo *
+make_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
+				  bool consider_parallel, bool is_partial)
+{
+	RelOptInfo *grouping_rel;
+
+	/*
+	 * If we are doing partial aggregation, then we need to fetch
+	 * partially grouped rel given by UPPERREL_PARTIAL_GROUP_AGG as it
+	 * stores all partial aggregation paths.
+	 */
+	UpperRelationKind upper_relkind = is_partial ? UPPERREL_PARTIAL_GROUP_AGG :
+		UPPERREL_GROUP_AGG;
+
+	if (IS_OTHER_REL(input_rel))
+	{
 		/*
-		 * Fetch upper rel with input rel's relids, and mark this as "other
-		 * upper rel".
+		 * Now that there can be multiple grouping relations, if we have to
+		 * manage those in the root, we need separate identifiers for those.
+		 * What better identifier than the input relids themselves?
 		 */
-		grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG,
-									  input_rel->relids);
-		grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
-		/* Similarly for the partial upper rel. */
-		partially_grouped_rel = fetch_upper_rel(root,
-												UPPERREL_PARTIAL_GROUP_AGG,
-												input_rel->relids);
-		partially_grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
+		grouping_rel = fetch_upper_rel(root, upper_relkind,
+									   input_rel->relids);
+		grouping_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
 	}
 	else
 	{
@@ -3719,12 +3790,40 @@ create_grouping_paths(PlannerInfo *root,
 		 * upperrel.  Paths that are only partially aggregated go into the
 		 * (UPPERREL_PARTIAL_GROUP_AGG, NULL) upperrel.
 		 */
-		grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
-		partially_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
-												NULL);
+		grouping_rel = fetch_upper_rel(root, upper_relkind, NULL);
 	}
 
 	/*
+	 * If the input rel belongs to a single FDW, so does the grouping rel whether
+	 * full grouping rel or partial grouping rel.
+	 */
+	grouping_rel->serverid = input_rel->serverid;
+	grouping_rel->userid = input_rel->userid;
+	grouping_rel->useridiscurrent = input_rel->useridiscurrent;
+	grouping_rel->fdwroutine = input_rel->fdwroutine;
+	grouping_rel->consider_parallel = consider_parallel;
+
+	return grouping_rel;
+}
+
+static void
+make_grouping_rels(PlannerInfo *root, RelOptInfo *input_rel,
+				   bool force_partial_agg, GroupPathExtraData *extra,
+				   grouping_sets_data *gd, RelOptInfo **grouped_rel,
+				   RelOptInfo **partially_grouped_rel)
+{
+	AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
+	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+	Node		   *havingQual = extra->havingQual;
+	PathTarget	   *target = extra->target;
+	Query		   *parse = root->parse;
+
+	*grouped_rel = NULL;
+	*partially_grouped_rel = NULL;
+
+	Assert(target);
+
+	/*
 	 * If the input relation is not parallel-safe, then the grouped relation
 	 * can't be parallel-safe, either.  Otherwise, it's parallel-safe if the
 	 * target list and HAVING quals are parallel-safe.  The partially grouped
@@ -3733,114 +3832,45 @@ create_grouping_paths(PlannerInfo *root,
 	if (input_rel->consider_parallel &&
 		is_parallel_safe(root, (Node *) target->exprs) &&
 		is_parallel_safe(root, havingQual))
-	{
-		grouped_rel->consider_parallel = true;
-		partially_grouped_rel->consider_parallel = true;
-	}
-
-	/*
-	 * If the input rel belongs to a single FDW, so does the grouped rel. Same
-	 * for the partially_grouped_rel.
-	 */
-	grouped_rel->serverid = input_rel->serverid;
-	grouped_rel->userid = input_rel->userid;
-	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
-	grouped_rel->fdwroutine = input_rel->fdwroutine;
-	partially_grouped_rel->serverid = input_rel->serverid;
-	partially_grouped_rel->userid = input_rel->userid;
-	partially_grouped_rel->useridiscurrent = input_rel->useridiscurrent;
-	partially_grouped_rel->fdwroutine = input_rel->fdwroutine;
-
-	/*
-	 * Check for degenerate grouping.
-	 */
-	if ((root->hasHavingQual || parse->groupingSets) &&
-		!parse->hasAggs && parse->groupClause == NIL)
-	{
-		/*
-		 * We have a HAVING qual and/or grouping sets, but no aggregates and
-		 * no GROUP BY (which implies that the grouping sets are all empty).
-		 *
-		 * This is a degenerate case in which we are supposed to emit either
-		 * zero or one row for each grouping set depending on whether HAVING
-		 * succeeds.  Furthermore, there cannot be any variables in either
-		 * HAVING or the targetlist, so we actually do not need the FROM table
-		 * at all!	We can just throw away the plan-so-far and generate a
-		 * Result node.  This is a sufficiently unusual corner case that it's
-		 * not worth contorting the structure of this module to avoid having
-		 * to generate the earlier paths in the first place.
-		 */
-		int			nrows = list_length(parse->groupingSets);
-		Path	   *path;
-
-		/*
-		 * Degenerate grouping will never see child relation as there is no
-		 * partitionwise grouping is performed on degenerate grouping case.
-		 */
-		Assert(!IS_OTHER_REL(input_rel));
-
-		if (nrows > 1)
-		{
-			/*
-			 * Doesn't seem worthwhile writing code to cons up a
-			 * generate_series or a values scan to emit multiple rows. Instead
-			 * just make N clones and append them.  (With a volatile HAVING
-			 * clause, this means you might get between 0 and N output rows.
-			 * Offhand I think that's desired.)
-			 */
-			List	   *paths = NIL;
-
-			while (--nrows >= 0)
-			{
-				path = (Path *)
-					create_result_path(root, grouped_rel,
-									   target,
-									   (List *) parse->havingQual);
-				paths = lappend(paths, path);
-			}
-			path = (Path *)
-				create_append_path(grouped_rel,
-								   paths,
-								   NIL,
-								   target,
-								   NULL,
-								   0,
-								   false,
-								   NIL,
-								   -1);
-			path->pathtarget = target;
-		}
-		else
-		{
-			/* No grouping sets, or just one, so one output row */
-			path = (Path *)
-				create_result_path(root, grouped_rel,
-								   target,
-								   (List *) parse->havingQual);
-		}
+		extra->consider_parallel = true;
+	else
+		extra->consider_parallel = false;
 
-		add_path(grouped_rel, path);
+	extra->partitionwise_grouping = can_partitionwise_grouping(root, input_rel,
+															   target,
+															   force_partial_agg,
+															   extra, gd,
+															   &extra->partial_partitionwise_grouping);
 
-		/* No need to consider any other alternatives. */
-		set_cheapest(grouped_rel);
-
-		return grouped_rel;
-	}
+	*grouped_rel = make_grouping_rel(root, input_rel, extra->consider_parallel,
+									 false);
 
 	/*
 	 * Figure out whether a PartialAggregate/Finalize Aggregate execution
 	 * strategy is viable.
 	 */
-	try_parallel_aggregation = can_parallel_agg(root, input_rel, grouped_rel,
-												extra);
+	extra->try_parallel_aggregation = can_parallel_agg(root, input_rel,
+													   *grouped_rel, extra);
 
 	/*
+	 * A partial grouped relation will be required if we are using parallel
+	 * paths or we are allowed to compute only partial aggregates or
+	 * partition-wise join is going to use partial aggregation.
+	 *
 	 * Before generating paths for grouped_rel, we first generate any possible
-	 * partial paths for partially_grouped_rel; that way, later code can
-	 * easily consider both parallel and non-parallel approaches to grouping.
+	 * partial paths for partially_grouped_rel; that way, later code can easily
+	 * consider both parallel and non-parallel approaches to grouping.
 	 */
-	if (try_parallel_aggregation || isPartialAgg)
+	if (extra->try_parallel_aggregation || force_partial_agg ||
+		(extra->partitionwise_grouping &&
+		 extra->partial_partitionwise_grouping))
 	{
+		PathTarget *partial_grouping_target;
+
+		*partially_grouped_rel = make_grouping_rel(root, input_rel,
+												   extra->consider_parallel,
+												   true);
+
 		/*
 		 * Build target list for partial aggregate paths.  These paths cannot
 		 * just emit the same tlist as regular aggregate paths, because (1) we
@@ -3848,9 +3878,9 @@ create_grouping_paths(PlannerInfo *root,
 		 * appear in the result tlist, and (2) the Aggrefs must be set in
 		 * partial mode.
 		 */
-		partial_grouping_target = child_data ? child_data->partialRelTarget :
-			make_partial_grouping_target(root, target, havingQual);
-		partially_grouped_rel->reltarget = partial_grouping_target;
+		partial_grouping_target = make_partial_grouping_target(root, target,
+															   havingQual);
+		(*partially_grouped_rel)->reltarget = partial_grouping_target;
 
 		/* Set partial aggregation costs, if not already computed. */
 		if (!extra->partial_costs_set)
@@ -3881,29 +3911,45 @@ create_grouping_paths(PlannerInfo *root,
 			extra->partial_costs_set = true;
 		}
 	}
+}
+
+static void
+populate_grouping_rels_with_paths(PlannerInfo *root,
+								  RelOptInfo *input_rel,
+								  RelOptInfo *grouped_rel,
+								  RelOptInfo *partially_grouped_rel,
+								  grouping_sets_data *gd,
+								  const AggClauseCosts *agg_costs,
+								  bool is_partial_agg,
+								  GroupPathExtraData *extra)
+{
+	Query	   *parse = root->parse;
+	Path	   *cheapest_path = input_rel->cheapest_total_path;
+	PathTarget *grouped_rel_target = extra->target;
+	Node	   *havingQual = extra->havingQual;
 
 	/* Apply partitionwise aggregation technique, if possible. */
-	try_partitionwise_grouping(root, input_rel, grouped_rel,
-							   partially_grouped_rel, target, agg_costs,
-							   gd, extra, havingQual, isPartialAgg);
+	if (extra->partitionwise_grouping)
+		try_partitionwise_grouping(root, input_rel, grouped_rel,
+								   partially_grouped_rel,
+								   agg_costs, gd, extra);
 
 	/*
 	 * Try parallel aggregation, if possible.  This produces only partially
 	 * grouped paths, since the same group could be produced by more than one
 	 * worker.
 	 */
-	if (try_parallel_aggregation)
+	if (extra->try_parallel_aggregation)
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel, gd, extra,
-										  true, isPartialAgg);
+										  true,
+										  extra->partial_partitionwise_grouping);
 
 	/*
-	 * Now generate non-partial paths.  When isPartialAgg = true, we're
-	 * generating paths for a child rel whose partition keys are not contained
-	 * in the grouping keys, so we can only generate partially grouped paths.
-	 * Otherwise, we can do complete grouping.
+	 * Now generate non-parallel paths, partial or full aggregation paths as
+	 * required.
 	 */
-	if (isPartialAgg)
+	if (is_partial_agg)
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel, gd, extra,
 										  false, true);
@@ -3915,12 +3961,14 @@ create_grouping_paths(PlannerInfo *root,
 		dNumGroups = get_number_of_groups(root,
 										  cheapest_path->rows,
 										  gd,
-										  child_data ? make_tlist_from_pathtarget(target) : parse->targetList);
+										  IS_OTHER_REL(grouped_rel) ? make_tlist_from_pathtarget(grouped_rel_target) :
+										  parse->targetList);
 
 		/* Build final grouping paths */
-		add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
-								  partially_grouped_rel, agg_costs, gd, extra,
-								  dNumGroups, (List *) havingQual);
+		add_paths_to_grouping_rel(root, input_rel, grouped_rel,
+								  grouped_rel_target, partially_grouped_rel,
+								  agg_costs, gd, extra, dNumGroups,
+								  (List *) havingQual);
 
 		/* Give a helpful error if we failed to find any implementation */
 		if (grouped_rel->pathlist == NIL)
@@ -3938,7 +3986,7 @@ create_grouping_paths(PlannerInfo *root,
 		grouped_rel->fdwroutine->GetForeignUpperPaths)
 		grouped_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_GROUP_AGG,
 													  input_rel, grouped_rel,
-													  child_data);
+													  extra);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
@@ -3948,9 +3996,66 @@ create_grouping_paths(PlannerInfo *root,
 	/* Now choose the best path(s) */
 	if (grouped_rel->pathlist)
 		set_cheapest(grouped_rel);
-	if (partially_grouped_rel->pathlist)
+	if (partially_grouped_rel && partially_grouped_rel->pathlist)
 		set_cheapest(partially_grouped_rel);
 
+}
+
+/*
+ * create_grouping_paths
+ *
+ * Build a new upperrel containing Paths for grouping and/or aggregation.
+ * Along the way, we also build an upperrel for Paths which are partially
+ * grouped and/or aggregated.  A partially grouped and/or aggregated path
+ * needs a FinalizeAggregate node to complete the aggregation.  Currently,
+ * the only partially grouped paths we build are also partial paths; that
+ * is, they need a Gather and then a FinalizeAggregate.
+ *
+ * input_rel: contains the source-data Paths
+ * target: the pathtarget for the result Paths to compute
+ * agg_costs: cost info about all aggregates in query (in AGGSPLIT_SIMPLE mode)
+ * rollup_lists: list of grouping sets, or NIL if not doing grouping sets
+ * rollup_groupclauses: list of grouping clauses for grouping sets,
+ *		or NIL if not doing grouping sets
+ *
+ * Note: all Paths in input_rel are expected to return the target computed
+ * by make_group_input_target.
+ *
+ * We need to consider sorted and hashed aggregation in the same function,
+ * because otherwise (1) it would be harder to throw an appropriate error
+ * message if neither way works, and (2) we should not allow hashtable size
+ * considerations to dissuade us from using hashing if sorting is not possible.
+ *
+ * If input rel itself is "other" relation, then we are creating grouping paths
+ * for the child relation. Thus child_data should provide child specifc details
+ * like havingQual, whether it should compute partial aggregation or not etc.
+ */
+static RelOptInfo *
+create_grouping_paths(PlannerInfo *root,
+					  RelOptInfo *input_rel,
+					  PathTarget *target,
+					  const AggClauseCosts *agg_costs,
+					  grouping_sets_data *gd)
+{
+	RelOptInfo *grouped_rel;
+	RelOptInfo *partially_grouped_rel;
+	GroupPathExtraData extra;
+
+	compute_group_path_extra_data(root, &extra, gd, agg_costs);
+	extra.target = target;
+	extra.havingQual = root->parse->havingQual;
+
+	grouped_rel = try_degenerate_grouping_paths(root, input_rel, target);
+	if (grouped_rel)
+		return grouped_rel;
+
+	make_grouping_rels(root, input_rel, false, &extra, gd, &grouped_rel,
+					   &partially_grouped_rel);
+
+	populate_grouping_rels_with_paths(root, input_rel, grouped_rel,
+									  partially_grouped_rel, gd, agg_costs,
+									  false, &extra);
+
 	return grouped_rel;
 }
 
@@ -6176,48 +6281,53 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 
 		/*
 		 * Instead of operating directly on the input relation, we can
-		 * consider finalizing a partially aggregated path.
+		 * consider finalizing a partially aggregated path when those are
+		 * available.
 		 */
-		foreach(lc, partially_grouped_rel->pathlist)
+		if (partially_grouped_rel)
 		{
-			Path	   *path = (Path *) lfirst(lc);
-
-			/*
-			 * Insert a Sort node, if required.  But there's no point in
-			 * sorting anything but the cheapest path.
-			 */
-			if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+			foreach(lc, partially_grouped_rel->pathlist)
 			{
-				if (path != partially_grouped_rel->cheapest_total_path)
-					continue;
-				path = (Path *) create_sort_path(root,
-												 grouped_rel,
-												 path,
-												 root->group_pathkeys,
-												 -1.0);
-			}
+				Path	   *path = (Path *) lfirst(lc);
 
-			if (parse->hasAggs)
-				add_path(grouped_rel, (Path *)
-						 create_agg_path(root,
-										 grouped_rel,
-										 path,
-										 target,
-										 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
-										 AGGSPLIT_FINAL_DESERIAL,
-										 parse->groupClause,
-										 havingQual,
-										 agg_final_costs,
-										 dNumGroups));
-			else
-				add_path(grouped_rel, (Path *)
-						 create_group_path(root,
-										   grouped_rel,
-										   path,
-										   target,
-										   parse->groupClause,
-										   havingQual,
-										   dNumGroups));
+				/*
+				 * Insert a Sort node, if required.  But there's no point in
+				 * sorting anything but the cheapest path.
+				 */
+				if (!pathkeys_contained_in(root->group_pathkeys,
+										   path->pathkeys))
+				{
+					if (path != partially_grouped_rel->cheapest_total_path)
+						continue;
+					path = (Path *) create_sort_path(root,
+													 grouped_rel,
+													 path,
+													 root->group_pathkeys,
+													 -1.0);
+				}
+
+				if (parse->hasAggs)
+					add_path(grouped_rel, (Path *)
+							 create_agg_path(root,
+											 grouped_rel,
+											 path,
+											 target,
+											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+											 AGGSPLIT_FINAL_DESERIAL,
+											 parse->groupClause,
+											 havingQual,
+											 agg_final_costs,
+											 dNumGroups));
+				else
+					add_path(grouped_rel, (Path *)
+							 create_group_path(root,
+											   grouped_rel,
+											   path,
+											   target,
+											   parse->groupClause,
+											   havingQual,
+											   dNumGroups));
+			}
 		}
 	}
 
@@ -6271,7 +6381,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 * grouped path. Once again, we'll only do this if it looks as though
 		 * the hash table won't exceed work_mem.
 		 */
-		if (partially_grouped_rel->pathlist)
+		if (partially_grouped_rel && partially_grouped_rel->pathlist)
 		{
 			Path	   *path = partially_grouped_rel->cheapest_total_path;
 
@@ -6554,7 +6664,7 @@ can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 		/* We don't know how to do grouping sets in parallel. */
 		return false;
 	}
-	else if (extra->can_partial_agg)
+	else if (!extra->can_partial_agg)
 	{
 		/* Insufficient support for partial mode. */
 		return false;
@@ -6658,6 +6768,70 @@ apply_scanjoin_target_to_paths(PlannerInfo *root, RelOptInfo *rel,
 }
 
 /*
+ * can_partitionwise_grouping
+ *	   Can we use partitionwise grouping technique?
+ */
+static bool
+can_partitionwise_grouping(PlannerInfo *root, RelOptInfo *input_rel,
+						   PathTarget *target, bool force_partial_agg,
+						   GroupPathExtraData *extra, grouping_sets_data *gd,
+						   bool *is_partial_agg)
+{
+	Query	   *parse = root->parse;
+
+	*is_partial_agg = false;
+
+	/* No, if user disabled partitionwise aggregation. */
+	if (!enable_partitionwise_aggregate)
+		return false;
+
+	/*
+	 * Currently, grouping sets plan does not work with an inheritance subtree
+	 * (see notes in create_groupingsets_plan). Moreover, grouping sets
+	 * implies multiple group by clauses, each of which may not have all
+	 * partition keys. Those sets which have all partition keys will be
+	 * computed completely for each partition, but others will require partial
+	 * aggregation. We will need to apply partitionwise aggregation at each
+	 * derived group by clause and not as a whole-sale strategy.  Due to this
+	 * we won't be able to compute "whole" grouping sets here and thus bail
+	 * out.
+	 */
+	if (parse->groupingSets || gd)
+		return false;
+
+	/*
+	 * Nothing to do, if the input relation is not partitioned or it has no
+	 * partitioned relations.
+	 */
+	if (!input_rel->part_scheme || !input_rel->part_rels)
+		return false;
+
+	/* Nothing to do, if the input relation itself is dummy. */
+	if (IS_DUMMY_REL(input_rel))
+		return false;
+
+	/*
+	 * If partition keys are part of group by clauses, then we can do full
+	 * partitionwise aggregation.  Otherwise need to calculate partial
+	 * aggregates for each partition and combine them.
+	 *
+	 * However, if caller forces to perform partial aggregation, then do that
+	 * unconditionally.
+	 */
+	*is_partial_agg = (force_partial_agg ||
+					   !group_by_has_partkey(input_rel, target,
+											 parse->groupClause));
+	/*
+	 * If we need to perform partial aggregation but can not compute partial
+	 * aggregates, no partitionwise grouping is possible.
+	 */
+	if (*is_partial_agg && !extra->can_partial_agg)
+			return false;
+
+	return true;
+}
+
+/*
  * try_partitionwise_grouping
  *
  * If the partition keys of input relation are part of group by clause, all the
@@ -6692,77 +6866,15 @@ try_partitionwise_grouping(PlannerInfo *root,
 						   RelOptInfo *input_rel,
 						   RelOptInfo *grouped_rel,
 						   RelOptInfo *partially_grouped_rel,
-						   PathTarget *target,
 						   const AggClauseCosts *agg_costs,
 						   grouping_sets_data *gd,
-						   GroupPathExtraData *extra,
-						   Node *havingQual,
-						   bool forcePartialAgg)
+						   GroupPathExtraData *extra)
 {
-	Query	   *query = root->parse;
 	int			nparts;
 	int			cnt_parts;
 	RelOptInfo **part_rels;
 	List	   *live_children = NIL;
-	OtherUpperPathExtraData child_data;
-	bool		isPartialAgg = false;
-
-	/* Nothing to do, if user disabled partitionwise aggregation. */
-	if (!enable_partitionwise_aggregate)
-		return;
-
-	/*
-	 * Currently, grouping sets plan does not work with an inheritance subtree
-	 * (see notes in create_groupingsets_plan). Moreover, grouping sets
-	 * implies multiple group by clauses, each of which may not have all
-	 * partition keys. Those sets which have all partition keys will be
-	 * computed completely for each partition, but others will require partial
-	 * aggregation. We will need to apply partitionwise aggregation at each
-	 * derived group by clause and not as a whole-sale strategy.  Due to this
-	 * we won't be able to compute "whole" grouping sets here and thus bail
-	 * out.
-	 */
-	if (query->groupingSets || gd)
-		return;
-
-	/*
-	 * Nothing to do, if the input relation is not partitioned or it has no
-	 * partitioned relations.
-	 */
-	if (!input_rel->part_scheme || !input_rel->part_rels)
-		return;
-
-	/* Nothing to do, if the input relation itself is dummy. */
-	if (IS_DUMMY_REL(input_rel))
-		return;
-
-	/*
-	 * If partition keys are part of group by clauses, then we can do full
-	 * partitionwise aggregation.  Otherwise need to calculate partial
-	 * aggregates for each partition and combine them.
-	 *
-	 * However, if caller forces to perform partial aggregation, then do that
-	 * unconditionally.
-	 */
-	if (forcePartialAgg ||
-		!group_by_has_partkey(input_rel, target, query->groupClause))
-	{
-		/*
-		 * Need to perform partial aggregation.  However check whether we can
-		 * do aggregation in partial or not.  If no, then return.
-		 */
-		if (agg_costs->hasNonPartial || agg_costs->hasNonSerial)
-			return;
-
-		/* Safe to perform partial aggregation */
-		isPartialAgg = true;
-	}
-
-	/*
-	 * Set isPartialAgg flag in OtherUpperPathExtraData. This flag is required
-	 * at places where aggregation path is created, like postgres_fdw.
-	 */
-	child_data.isPartialAgg = isPartialAgg;
+	PathTarget *target = extra->target;
 
 	nparts = input_rel->nparts;
 	part_rels = (RelOptInfo **) palloc(nparts * sizeof(RelOptInfo *));
@@ -6774,7 +6886,7 @@ try_partitionwise_grouping(PlannerInfo *root,
 	 * the partitioning details for this grouped rel. In case of a partial
 	 * aggregation, this is not true.
 	 */
-	if (!isPartialAgg)
+	if (!extra->partial_partitionwise_grouping)
 	{
 		grouped_rel->part_scheme = input_rel->part_scheme;
 		grouped_rel->nparts = nparts;
@@ -6787,23 +6899,41 @@ try_partitionwise_grouping(PlannerInfo *root,
 	{
 		RelOptInfo *input_child_rel = input_rel->part_rels[cnt_parts];
 		PathTarget *child_target = copy_pathtarget(target);
-		PathTarget *partial_target;
 		AppendRelInfo **appinfos;
 		int			nappinfos;
 		PathTarget *scanjoin_target;
+		GroupPathExtraData child_extra;
+		RelOptInfo	*child_grouped_rel;
+		RelOptInfo  *child_partially_grouped_rel;
 
 		/*
-		 * Now that there can be multiple grouping relations, if we have to
-		 * manage those in the root, we need separate identifiers for those.
-		 * What better identifier than the input relids themselves?
-		 *
-		 * However, if we are doing partial aggregation, then we need to fetch
-		 * partially grouped rel given by UPPERREL_PARTIAL_GROUP_AGG as it
-		 * stores all partial aggregation paths.
+		 * Copy the given "extra" structure as is. make_grouping_rels() will
+		 * override the members specific to this child.
 		 */
-		part_rels[cnt_parts] = fetch_upper_rel(root,
-											   isPartialAgg ? UPPERREL_PARTIAL_GROUP_AGG : UPPERREL_GROUP_AGG,
-											   input_child_rel->relids);
+		memcpy(&child_extra, extra, sizeof(child_extra));
+
+		appinfos = find_appinfos_by_relids(root, input_child_rel->relids,
+										   &nappinfos);
+
+		child_target->exprs = (List *) adjust_appendrel_attrs(root,
+															  (Node *) target->exprs,
+															  nappinfos,
+															  appinfos);
+		child_extra.target = child_target;
+		child_extra.havingQual = (Node *) adjust_appendrel_attrs(root,
+																 extra->havingQual,
+																 nappinfos,
+																 appinfos);
+
+		make_grouping_rels(root, input_child_rel,
+						   extra->partial_partitionwise_grouping,
+						   &child_extra, gd, &child_grouped_rel,
+						   &child_partially_grouped_rel);
+
+		if (extra->partial_partitionwise_grouping)
+			part_rels[cnt_parts] = child_partially_grouped_rel;
+		else
+			part_rels[cnt_parts] = child_grouped_rel;
 
 		/* Input child rel must have a path */
 		Assert(input_child_rel->pathlist != NIL);
@@ -6811,15 +6941,16 @@ try_partitionwise_grouping(PlannerInfo *root,
 		/* Ignore empty children. They contribute nothing. */
 		if (IS_DUMMY_REL(input_child_rel))
 		{
-			mark_dummy_rel(part_rels[cnt_parts]);
+			mark_dummy_rel(child_grouped_rel);
+
+			if (child_partially_grouped_rel)
+				mark_dummy_rel(child_partially_grouped_rel);
+
 			continue;
 		}
 		else
 			live_children = lappend(live_children, part_rels[cnt_parts]);
 
-		appinfos = find_appinfos_by_relids(root, input_child_rel->relids,
-										   &nappinfos);
-
 		/*
 		 * Copy pathtarget from underneath scan/join as we are modifying it
 		 * and translate its Vars with respect to this appendrel.  We use
@@ -6840,33 +6971,13 @@ try_partitionwise_grouping(PlannerInfo *root,
 		apply_scanjoin_target_to_paths(root, input_child_rel, scanjoin_target,
 									   false);
 
-		child_target->exprs = (List *) adjust_appendrel_attrs(root,
-															  (Node *) target->exprs,
-															  nappinfos,
-															  appinfos);
-
-		/*
-		 * Parallel aggregation requires partial target, so compute it here
-		 * and translate all vars. For partial aggregation, we need it
-		 * anyways.
-		 */
-		partial_target = make_partial_grouping_target(root, target,
-													  havingQual);
-		partial_target->exprs = (List *) adjust_appendrel_attrs(root,
-																(Node *) partial_target->exprs,
-																nappinfos,
-																appinfos);
-
-		child_data.relTarget = child_target;
-		child_data.partialRelTarget = partial_target;
-		child_data.havingQual = (Node *) adjust_appendrel_attrs(root,
-																havingQual,
-																nappinfos,
-																appinfos);
-
 		/* Create grouping paths for this child relation. */
-		create_grouping_paths(root, input_child_rel, child_target, agg_costs,
-							  gd, extra, &child_data);
+		populate_grouping_rels_with_paths(root, input_child_rel,
+										  child_grouped_rel,
+										  child_partially_grouped_rel, gd,
+										  agg_costs,
+										  extra->partial_partitionwise_grouping,
+										  &child_extra);
 
 		pfree(appinfos);
 	}
@@ -6881,11 +6992,11 @@ try_partitionwise_grouping(PlannerInfo *root,
 	 * Finally create append rel for all children and stick them into the
 	 * grouped_rel or partially_grouped_rel.
 	 */
-	if (isPartialAgg)
+	if (extra->partial_partitionwise_grouping)
 	{
+		Assert(partially_grouped_rel);
 		add_paths_to_append_rel(root, partially_grouped_rel,
-								make_partial_grouping_target(root, target,
-															 havingQual),
+								partially_grouped_rel->reltarget,
 								live_children);
 
 		/*
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index f56f19a..113e835 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -2300,14 +2300,12 @@ typedef struct JoinPathExtraData
  * for other upper rels.
  *
  * relTarget is the PathTarget for this upper rel
- * partialRelTarget is the partial PathTarget for this upper rel
  * isPartialAgg is true if we are creating a partial aggregation path
  * havingQual is the quals applied to this upper rel
  */
 typedef struct
 {
 	PathTarget *relTarget;
-	PathTarget *partialRelTarget;
 	bool		isPartialAgg;
 	Node	   *havingQual;
 } OtherUpperPathExtraData;

#113

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#112)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Mar 8, 2018 at 7:31 PM, Robert Haas <robertmhaas@gmail.com>

wrote:

This kind of goes along with the suggestion I made yesterday to
introduce a new function, which at the time I proposed calling
initialize_grouping_rel(), to set up new grouped or partially grouped
relations. By doing that it would be easier to ensure the
initialization is always done in a consistent way but only for the
relations we actually need. But maybe we should call it
fetch_grouping_rel() instead. The idea would be that instead of
calling fetch_upper_rel() we would call fetch_grouping_rel() when it
is a question of the grouped or partially grouped relation. It would
either return the existing relation or initialize a new one for us. I
think that would make it fairly easy to initialize only the ones we're
going to need.

Hmm. I am working on refactoring the code to do something like this.

Here's patch doing the same. I have split create_grouping_paths() into
three functions 1. to handle degenerate grouping paths
(try_degenerate_grouping_paths()) 2. to create the grouping rels,
partial grouped rel and grouped rel (make_grouping_rels()), which also
sets some properties in GroupPathExtraData. 3. populate grouping rels
with paths (populate_grouping_rels_with_paths()). With those changes,
I have been able to get rid of partially grouped rels when they are
not necessary. But I haven't tried to get rid of grouped_rels when
they are not needed.

GroupPathExtraData now completely absorbs members from and replaces
OtherUpperPathExtraData. This means that we have to consider a way to
pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
haven't tried it in this patch.

With this patch there's a failure in partition_aggregation where the
patch is creating paths with MergeAppend with GatherMerge underneath.
I think this is related to the call
add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
true. But I didn't investigate it further.

With those two things remaining I am posting this patch, so that
Jeevan Chalke can merge this patch into his patches and also merge
some of his changes related to mine and Robert's changes. Let me know
if this refactoring looks good.

Thanks Ashutosh for the refactoring patch.
I will rebase my changes and will also resolve above two issues you have
reported.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#114

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#112)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat

GroupPathExtraData now completely absorbs members from and replaces
OtherUpperPathExtraData. This means that we have to consider a way to
pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
haven't tried it in this patch.

Initially, I was passing OtherUpperPathExtraData to FDW. But now we need to
pass GroupPathExtraData.

However, since GetForeignUpperPaths() is a generic function for all upper
relations, we might think of renaming this struct to UpperPathExtraData.
Add an UpperRelationKind member to it
Which will be used to distinguish the passed in extra data. But now we only
have extra data for grouping only, I chose not to do that here. But
someone, when needed, may choose this approach.

With this patch there's a failure in partition_aggregation where the
patch is creating paths with MergeAppend with GatherMerge underneath.
I think this is related to the call
add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
true. But I didn't investigate it further.

I fixed it. We need to pass is_partial_agg instead of
extra->partial_partitionwise_grouping while calling
add_paths_to_partial_grouping_rel() in case of parallelism.

With those two things remaining I am posting this patch, so that
Jeevan Chalke can merge this patch into his patches and also merge
some of his changes related to mine and Robert's changes. Let me know
if this refactoring looks good.

Will rebase my changes tomorrow.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

fix_issues_from_AB_refactoring_changes.patchtext/x-patch; charset=US-ASCII; name=fix_issues_from_AB_refactoring_changes.patchDownload

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 87ea18c..ed16f7d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -353,7 +353,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 UpperRelationKind stage,
 							 RelOptInfo *input_rel,
 							 RelOptInfo *output_rel,
-							 OtherUpperPathExtraData *extra);
+							 GroupPathExtraData *group_extra);
 
 /*
  * Helper functions
@@ -429,7 +429,7 @@ static void add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 static void add_foreign_grouping_paths(PlannerInfo *root,
 						   RelOptInfo *input_rel,
 						   RelOptInfo *grouped_rel,
-						   OtherUpperPathExtraData *extra);
+						   GroupPathExtraData *group_extra);
 static void apply_server_options(PgFdwRelationInfo *fpinfo);
 static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
@@ -5235,7 +5235,7 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 static void
 postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 							 RelOptInfo *input_rel, RelOptInfo *output_rel,
-							 OtherUpperPathExtraData *extra)
+							 GroupPathExtraData *group_extra)
 {
 	PgFdwRelationInfo *fpinfo;
 
@@ -5255,7 +5255,7 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 	fpinfo->pushdown_safe = false;
 	output_rel->fdw_private = fpinfo;
 
-	add_foreign_grouping_paths(root, input_rel, output_rel, extra);
+	add_foreign_grouping_paths(root, input_rel, output_rel, group_extra);
 }
 
 /*
@@ -5268,7 +5268,7 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 static void
 add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 						   RelOptInfo *grouped_rel,
-						   OtherUpperPathExtraData *extra)
+						   GroupPathExtraData *group_extra)
 {
 	Query	   *parse = root->parse;
 	PgFdwRelationInfo *ifpinfo = input_rel->fdw_private;
@@ -5288,16 +5288,16 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	 * Store passed-in target and havingQual in fpinfo. If its a foreign
 	 * partition, then path target and HAVING quals fetched from the root are
 	 * not correct as Vars within it won't match with this child relation.
-	 * However, server passed them through extra and thus fetch from it.
+	 * However, server passed them through group_extra and thus fetch from it.
 	 */
-	if (extra)
+	if (group_extra)
 	{
 		/* Partial aggregates are not supported. */
-		if (extra->isPartialAgg)
+		if (group_extra->partial_partitionwise_grouping)
 			return;
 
-		fpinfo->grouped_target = extra->relTarget;
-		fpinfo->havingQual = extra->havingQual;
+		fpinfo->grouped_target = group_extra->target;
+		fpinfo->havingQual = group_extra->havingQual;
 	}
 	else
 	{
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1611975..9a34bf9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -109,43 +109,6 @@ typedef struct
 	int		   *tleref_to_colnum_map;
 } grouping_sets_data;
 
-/*
- * Struct for extra information passed to create_grouping_paths
- *
- * can_hash is true if hash-based grouping is possible, false otherwise.
- * can_sort is true if sort-based grouping is possible, false otherwise.
- * can_partial_agg is true if partial aggregation is possible, false otherwise.
- * partial_costs_set indicates whether agg_partial_costs and agg_final_costs
- *		have valid costs set. Both of those are computed only when partial
- *		aggregation is required.
- * agg_partial_costs gives partial aggregation costs.
- * agg_final_costs gives finalization costs.
- */
-typedef struct
-{
-	/* Data which remains constant once set. */
-	bool		can_hash;
-	bool		can_sort;
-	bool		can_partial_agg;
-	bool		partial_costs_set;
-	AggClauseCosts agg_partial_costs;
-	AggClauseCosts agg_final_costs;
-
-	/*
-	 * Data which depends upon the input and grouping relations and hence may
-	 * change across partitioning hierarchy
-	 */
-	bool		consider_parallel;	/* It probably remains same across
-									 * partitioning hierarchy. But double
-									 * check.
-									 */
-	bool		try_parallel_aggregation;
-	bool		partitionwise_grouping;
-	bool		partial_partitionwise_grouping;
-	Node	   *havingQual;
-	PathTarget *target;
-} GroupPathExtraData;
-
 /* Local functions */
 static Node *preprocess_expression(PlannerInfo *root, Node *expr, int kind);
 static void preprocess_qual_conditions(PlannerInfo *root, Node *jtnode);
@@ -3942,8 +3905,7 @@ populate_grouping_rels_with_paths(PlannerInfo *root,
 	if (extra->try_parallel_aggregation)
 		add_paths_to_partial_grouping_rel(root, input_rel,
 										  partially_grouped_rel, gd, extra,
-										  true,
-										  extra->partial_partitionwise_grouping);
+										  true, is_partial_agg);
 
 	/*
 	 * Now generate non-parallel paths, partial or full aggregation paths as
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 1c2b1fc..6d94c73 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -63,7 +63,7 @@ typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
 											   UpperRelationKind stage,
 											   RelOptInfo *input_rel,
 											   RelOptInfo *output_rel,
-											   OtherUpperPathExtraData *extra);
+											   GroupPathExtraData *group_extra);
 
 typedef void (*AddForeignUpdateTargets_function) (Query *parsetree,
 												  RangeTblEntry *target_rte,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 113e835..55edd7c 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -2296,19 +2296,41 @@ typedef struct JoinPathExtraData
 } JoinPathExtraData;
 
 /*
- * Struct for extra information passed to subroutines involving path creation
- * for other upper rels.
+ * Struct for extra information passed to create_grouping_paths
  *
- * relTarget is the PathTarget for this upper rel
- * isPartialAgg is true if we are creating a partial aggregation path
- * havingQual is the quals applied to this upper rel
+ * can_hash is true if hash-based grouping is possible, false otherwise.
+ * can_sort is true if sort-based grouping is possible, false otherwise.
+ * can_partial_agg is true if partial aggregation is possible, false otherwise.
+ * partial_costs_set indicates whether agg_partial_costs and agg_final_costs
+ *		have valid costs set. Both of those are computed only when partial
+ *		aggregation is required.
+ * agg_partial_costs gives partial aggregation costs.
+ * agg_final_costs gives finalization costs.
  */
 typedef struct
 {
-	PathTarget *relTarget;
-	bool		isPartialAgg;
+	/* Data which remains constant once set. */
+	bool		can_hash;
+	bool		can_sort;
+	bool		can_partial_agg;
+	bool		partial_costs_set;
+	AggClauseCosts agg_partial_costs;
+	AggClauseCosts agg_final_costs;
+
+	/*
+	 * Data which depends upon the input and grouping relations and hence may
+	 * change across partitioning hierarchy
+	 */
+	bool		consider_parallel;	/* It probably remains same across
+									 * partitioning hierarchy. But double
+									 * check.
+									 */
+	bool		try_parallel_aggregation;
+	bool		partitionwise_grouping;
+	bool		partial_partitionwise_grouping;
 	Node	   *havingQual;
-} OtherUpperPathExtraData;
+	PathTarget *target;
+} GroupPathExtraData;
 
 /*
  * For speed reasons, cost estimation for join paths is performed in two

#115

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#114)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Mar 12, 2018 at 7:49 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat

GroupPathExtraData now completely absorbs members from and replaces
OtherUpperPathExtraData. This means that we have to consider a way to
pass GroupPathExtraData to FDWs through GetForeignUpperPaths(). I
haven't tried it in this patch.

Initially, I was passing OtherUpperPathExtraData to FDW. But now we need to
pass GroupPathExtraData.

However, since GetForeignUpperPaths() is a generic function for all upper
relations, we might think of renaming this struct to UpperPathExtraData. Add
an UpperRelationKind member to it
Which will be used to distinguish the passed in extra data. But now we only
have extra data for grouping only, I chose not to do that here. But someone,
when needed, may choose this approach.

We don't need UpperRelationKind member in that structure. That will be
provided by the RelOptInfo passed.

The problem here is the extra information required for grouping is not
going to be the same for that needed for window aggregate and
certainly not for ordering. If we try to jam everything in the same
structure, it will become large with many members useless for a given
operation. A reader will not have an idea about which of them are
useful and which of them are not. So, instead we should try some
polymorphism. I think we can pass a void * to GetForeignUpperPaths()
and corresponding FDW hook knows what to cast it to based on the
UpperRelationKind passed.

BTW, the patch has added an argument to GetForeignUpperPaths() but has
not documented the change in API. If we go the route of polymorphism,
we will need to document the mapping between UpperRelationKind and the
type of structure passed in.

With this patch there's a failure in partition_aggregation where the
patch is creating paths with MergeAppend with GatherMerge underneath.
I think this is related to the call
add_paths_to_partial_grouping_rel() when try_parallel_aggregation is
true. But I didn't investigate it further.

I fixed it. We need to pass is_partial_agg instead of
extra->partial_partitionwise_grouping while calling
add_paths_to_partial_grouping_rel() in case of parallelism.

Thanks for investigation and the fix.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#116

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#115)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi,

I have resolved all the comments/issues reported in this new patch-set.

Changes done by Ashutosh Bapat for splitting out create_grouping_paths()
into separate functions so that partitionwise aggregate code will use them
were based on my partitionwise aggregate changes. Those were like
refactoring changes. And thus, I have refactored them separately and before
any partitionwise changes (see
0005-Split-create_grouping_paths-and-Add-GroupPathExtraDa.patch). And then
I have re-based all partitionwise changes over it including all fixes.

The patch-set is complete now. But still, there is a scope of some comment
improvements due to all these refactorings. I will work on it. Also, need
to update few documentations and indentations etc. Will post those changes
in next patch-set. But meanwhile, this patch-set is ready to review.

On Tue, Mar 13, 2018 at 9:12 AM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Mon, Mar 12, 2018 at 7:49 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

On Mon, Mar 12, 2018 at 6:07 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 9, 2018 at 4:21 PM, Ashutosh Bapat

We don't need UpperRelationKind member in that structure. That will be
provided by the RelOptInfo passed.

The problem here is the extra information required for grouping is not
going to be the same for that needed for window aggregate and
certainly not for ordering. If we try to jam everything in the same
structure, it will become large with many members useless for a given
operation. A reader will not have an idea about which of them are
useful and which of them are not. So, instead we should try some
polymorphism. I think we can pass a void * to GetForeignUpperPaths()
and corresponding FDW hook knows what to cast it to based on the
UpperRelationKind passed.

Yep. Done this way.

BTW, the patch has added an argument to GetForeignUpperPaths() but has
not documented the change in API. If we go the route of polymorphism,
we will need to document the mapping between UpperRelationKind and the
type of structure passed in.

Oops. Will do that in next patchset.

Thanks for pointing out, I have missed this at first place it self.

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

partition-wise-agg-v17.tar.gzapplication/gzip; name=partition-wise-agg-v17.tar.gzDownload

����Z�<ks�F��J�����	�OQ���(~���������+I� ��!Z��_?f�	P��x�j��X0���������S?������������O�����|&�1��O�=���|���pt2�|������7b�E���dI��B|����NX?����O?�j�KF���;n��y�N#��<���.e��(������]���>9���d���?����}
����e��{����9==�'���I�?��N�d��O�S9��{��H���F�E"z�s�O���9�M4��N�A��b�u]��{�2^� e���F���s'���}&;�??���������H�{�a���f�H7=�su���k1:>�_�ES�h�4 ��e��M���f2N�k�I��m��Hb�x��d�G��_���i��	��P�]W|�S����[K�����}��u�`"�0�EA��f�O�����v��<>���S�����a�z����zc�7�v���/��5�p��g�N]����=��;'��?&�V��+n#�C�L�D,�iM��8�tG���X�+�u�#qGQ��DC}���&2M���:�h�����(�N�(N��,�d�zl�M�$*.�e6 o��u�������acN�`
6�>~��R��B��������T\,�'K��(IqKL]��@�I�^w���%��)���L���;#��������t�}d�cl���P�3co�z�wKXv���G�}|���J�h�����
� ��Ay�+D��4�K�����l��L��$��J;"�3�2���=sqw�3�J�`����H51G�[9���r���'"�C�nV�����?Ii�;S
�z"��j[m`GHY�g|Co��L64��6n��P���*j��m_�p��� f�|���+%�������n�V2�U��4(��8�M~��-�=�/9$<�01����+��'���.N�����p��3�y�L� }�9�X��e*ffm�0��I��.��4e����#DW���|�!�~J�L��L��ZT�;R������/J�(���`���O�(���H�T������H<!�k���
��=)g���2�J�K1G-��%�M������H��U�C�sG��n�L����I������!{>^g��]�+v�!���Rj�By7�46^G���~����2u�A�d�yZ����I���?�ySz���+����W������G���M����F`L%���L�f�b�?��t���-�~��;��-���
�[~����}�$��R��
+M��
2�i4WRv_�o�2l��'=x����+|(���&^o0<��N���''N�9�{s��NF��\��C��V�?�������(��MZ�D���ZV��T	qdpA)��u������r�~����>j#��	iy�w>�^ly9��r�_X�|�!�@EapG�Y�z��,��9�&��Y�7
PkAho���~"��t�9��*��F�IDB��\Z��>"/x�8�o�{���wc��O�g��h�[+��k%��[*q�����i�;8s{����/�����D����Q*y����%�=3���G���g7����c��Ri`����{�_�S�0�V�7��#�By���JU���YL1,�AQM	�c���:LR{U��5)Lo����Q��#�8	�RA �h��5��U��jS��l�J�U���hp�5�����|0�-�<���h�����t���BlRP,�cm�k��@���$��@
v0#�Q@�������9��ya����x���|�`I�5��]���_�����t�$�h�H��Ib��,8H �
JnM#����^yw �hv�#n}�A��M��E�	��<���3XTR
�����V
K`F��L�w*�?����0���m!�Z�OL�U�}�F�g2p����|h����SG�&�"�S�o��N���J��Q�TA���7�F��JC�,#�a�z��G�������3����3v����H��`x
�������yP�Y�Y�v�������>vD��y"bh4�-�����B�~�(4��L!��`���j��~�������-�/F���SI�B`��������8vL%&x���/LC"�dp�FG���a�*�(#VI��z�����j����A�QID�6�\�]]�!����~���Q��h���+c��U(�����W�@� ��x3��A�t�~�D�y�r�B�����Y��4��#m]=m��h��-���r��"�������:��#�����L�J��������1�z����SU����{��`\����������������x��`4�thx6��������'39�9co6������F�xY���1�'����-
/�!`K8SC�8s!��`�^p��:h~��&�,t���CS���I�IF�:�N:eK�9?ag���	��G�3Zg��+��|4&�/%]M4�������T)�>���O�Vyv'r�G��@�c���R(t��������$�d���T��]��L+'G�Z>���&�4��wl!f��EpCV��A�,:����i�h����Kt�dp���]�m�$�P|�E#A�8�!�in�i����!���
C��Y���S@���O +
�'�h�K��t��F"��D���������{5s�����Ou�u����O��&��B���a	q��$S��!����\|"!�H��	dH��R6�x����X���@�2WBVQ�D����I�^��������`V�4��DN(�W���x�q"sE�U�H�Y%K���v@~c�o��X��%f���*V�G�G�J>�4v2�n,�j-bx~��'��h69������I�����4���x���V��w Wr�t)�W�E_��!���7����4:���8nW�v���d�}����Y�38m�9�8��\9Z9�!%7�p��'�WmGGK�VN!>N�X��&�|�c�DL����4����b-�������\/]�~Rf`Z�/N��A�����0VT�.�����/(gy`w��$��������Z�t</J"�u8[�^sb,���d���B��=���[�p�&8c�NiS�.@�+X�"������Eu9����g������U��h����hd�jV�z���W���&��@O#����4��8�o�9��.�3M���8����9Ug.w��"��w�j��V��h4��xWs���m;���&�+W�P���m�~�������Q��XA���=|g�Bi���������#���y`��S��gp�	�Q��t��4F�������;n��U���t�>���������8Q����[�*��7����;�2#�zd�y�qwE]��
�	���nl��]uD�St���q3^!,�#4��Yr�(��n-%�.8�bl�8�	f������(y���=��B���sWB<�bL�����T-��@u�J��X�������[	9�g�ywl�8��p7gZ�h��U��&Q��&���lV���H�.����p��bHQ���L�S�
ufQIYPT'(���i�h�o�^�\��|?�������3�E����F&���u�8������g�(z���U����2��8h-Z�m�5D�6�
�����-?���r!�i�[Y=�5�3<\)��������#���Q�1�U���j�i%���c$]:p�P��#��Vm6Q�<��E��'�6�����:�Q�@���@�d�8I4�>�5���	W����J�$�Hx +�D���	�yl�������{~�d��U .CfX����V�M�X�����I�LU���$���Q}V�f>VZJ�������MDFGB�E�[v����O5��������U��~G��:�����^/���4����k��5(�5�6`\Nx�i����y�n��5��*����N���5��P�s�<#c,�gUub�
��FC���'�E�L�M������D��	�M���w$��yl��5��~
�����E
]��^
��[��Xa���������������7������o�����\}�Cgex �D]o�W/��h��j��uxZ.���&	<
��-�o
�E�j�����*-���
��u�]D�#���E������d��Kvu�V��h�����cE+Vq�G����0BJ���S���?�z�vv�<���Fuj���>E�t�!VY�2��:�����BLQqy��EP��L�����2�T�mjE�e�{A��b�P_�o7�"���["�O|�-u�p���GO0vi�O��0V
�n?K����sunA�F�.yN��6�L��G���A�����qz�N��Y�8T�\y��@���|�
�-��:](�R���
�V��<��iy�g�Z��~��"�&�Z�s1���P���a�a�����3��w�4,F�f1,V/�"�������k�^��JE��ZW����(�,�PL��v�m)�R����W�U�����Q
>n��4i��z�Q�^��"��4����X�`<��y���{��]?q-~�;�*r���)����,������`oG��:AF)Mn��	����f�����65|�^T����\�@�RG��pe�����~�����'���$�j�1��W��v��!A ���	'����b�.s�z�@t�eaj��s��hqc��� �_E*	V�rf�����pT�F���m��W
�����a�8x���i�e��.��U�/<G+�E�������!�8j�n)�@���p�a���}B*�nVV���7����'Q{�k������u7�x���d��'�W��/�@@bs�sUW�������������+�y#�� D�p��O����]���n��gE�R���:a�t����L*��q�_J��Y�����G��.o�P5��E��������������K����g���ja6�+s���b8L�l�V����Ix�T��6����**_������x�����}�=h|��U�S�A�Nu`�h�����q���E�����w���"H0>�E��'W�P�/_�/��MVG�;�.P�
���a��T�(H��Uq�����r�4#W	�q��������A�C��#4����E�DT��`��ju�*�$���s�9��kvX���G���$�|��>�-Ef	6��Q�����`�n��,��X\�g8���a��]YX��8�w���-%��S���
p_>u�p�,ne�&��0�*&��8�wQU3Q_�pO��]����o�[s����n��	�A�0��r�#�|d}��g"}����������^��F�������;��������F~�
����y��s�S��/��EV��.(VfY�S�q ��)W���������g5C���j}���������?S��K������!{���w{'��0�������|��N�ua�����EA���TB���O�6��o�h�j9f�e�yFd�����4T2��%�i��B9���[�P���0u�������q��_�~G�!^�-�J/L}���x�JD�+wK���fY���S���G[T&TP�	SP�H������f�������o��Y�p�����4�W��^7S��~(Xd	�[h�t�p�_��!-]<����_�Blo��e��@�!A��sO(s@Cp�?�(8*���`F��C�Y���R�j>/�Q�T\�0���c�&�y@T���:�98�Qp��n*"��\W0 �gD�aYlb_����9LD��-�Zr�(8x��+�����
���!��0I��1�Q��[�B�����e���X�Y�f��B��(��iGGb��.�.8a$5��FB��#�o��]�n>_��K�Ry�0<��A[����U4���l���}J6�)��	�2-��=q�?������Y�aTs�@�h �,2����;t�6�5r�u�U��/j0a���8�D�*�d�����2�K0����m/���z7�Vv���1���w�eD8�w��O�&Oe4��:�X����,�Oz�?���b���w��Z������m�X�p:s&��E��&��{[N���K����3G�H%�)��"-����?��(�
����8-���*(,w_��M��qY!N�zJ,�8���xh s�E2�,�1qkK�U�t57W�vv,<�F���=��ef�(q��T)�U{������=��#���{�
� �i�P����oH��isNg+>��S��\�3,�g��q�����|.�~
)Z�w2���77��7 �R�;)8D�/��0��������^F"o�q���7h�y��Z�,q(-���E���l�+&DT�:�;���'��s1�MK�7�����8|�������&��B�����X
�^p�Y�O��Lf9�Y��
���\Z�-zm���f��t��7��'E� ����q0Tv����3�����V6��9�<���o�6s�hm������F��}�E����7Z;r�h�q�h�����'�Fe'#y#K9u����7;HY�:{:Zsz�7P���������*�K7������%Z�����-;��hm�/���n�%Z;����`����K�����D��/���n�%Z�KHVeS��`iy������+���tB<O,�&N)������H�m�ZF���D�>6���3���L��:������b	>�M@��jYS��)��JZl�hy�o��G���1��a����^���L��K$\��E��g�`l
5I�D���6��Tk�W�=}�A7�6����*$��$<S������tB���hc�N��N�:+W�_��b_-l��������V�r
������I�)���L�5��R[�$��M��^�,�ls�fhZ����9}w ��XA"�����p>��-�W���t�{f���R���6��vQ`f3 ^�U��=���>�r�$qT�����%f?�J	�3&q{��!���������	�x�/���!5eCnnXF<�H���_�%b�n�wm�,R��F��3I'i�<��������y����Q��+��%�mL9���,���g/Y{K��u���9���sS��������u8^Q���3C�]MD4�����k �(�j��lFQ��G�)�:�;�m6k�41�1��M�b��D��g_)�Pg�� NvR�bdjn1�8�g�m�D���Z�#P���7edn0�`?�Ua�B�NUz\���Ka�K���]�"Q�\�jq��A�9���
p*������:=����&Y���k�#�~�O����|:/O��Ns^����������-o������F�
V������.�s\^���p���^��z�66�m����`�N���)
N�9e�)��:���e��M�	J vXX�S
��,,)}���F���9�tX�_r���T8lC��Fe��6�>LxT]Y����k ��D��8e�0���"��7�k�WPcR�1�nE-%�$�����Z��+����[��#z��2��,�5�r�e���K|��p)5��*��V�������������t?��� ����;�a�P������&s�{6��<n����W��� ,�#J�F�:�l���?��W�2�_O�sJ�"���l.���
��i#q��!��&Kzt�8x����Q��9�T�i�e�X�7�������djGi�`���j������v�VJ�F� ��QZ)yh��J%�o��J5���J���n�JiC���j�K+%7}Wi�`���R�fWI��}�I�&�]%�j��'�ry��F�2;�]��v����v�MJr�9����rI�c!��,�4�N�l��E����*N4��(��X�����a�C�����u�t�T~���J�,3�a��
��Xn3C����Ej����v�*��X�g�R���i��_���&�6�����6=���.�������x{�U@����
�a��q���r���}#�o�!�����-������Z��@�2����&0��%�$?[#���lCY�I������cqO�����Z��������6�ES��aN�q��� � ����m/�����T\��4�M�&v�Y�k�D})��N_`&g6�0mz���yG]*��C�'�9���J�Fa2�����F�l�PVf�l1�����������	���������)��p�����P����ncn���,gs�m{�!6�q�[����j�k�S?i,�:C��;�]������6��p�d�W(V{-=
&�e��+���'	��@�\%�LIF������k�M���<�c��$����'D���o"S�L�a\*�������,�FT�����{���e,�k,��Wu
tT�b�)q��}����2��1\>�-V���H�)X��4y��J{)vT�)E��=��(rrm��z������Tm�E�)�%���>s��y��[�H������+�h��,%�&S�K���{�j�������80ui�C�k���T������w��>�^��I5��w,6J;H��������?��~Aw�
�5CC��{k-�1�"f�&������m��,�����*�1M@v�#M��������H$�����7:����u�%��J���e���%��"\lb��mz5N�'���0z�pp ������:�_-7J��V���p����,��53}6�����T~'
�1��|z�����G�~��>0�
�4��I_�	��B������	as��PA��K$�Zv<������P ��1 4��}<����X�*s���"�\e�����NJ����O<y�y��K�b�7�5���������$���n��K��Q%��Ut-s�idV���K����;��z��lSD'���DU�I��E���B'�����d7�~��g0�_�WR&�_�B���\�DS(�r-_�K++��n��
���l|yv����������5�$�]��+�&���������w�P��j�X���I�H����\	Ok7`>[��o=�Io2�O{������j�������x���)���]�#�������A����Y������z�0�]�s���'�G��~��`�O!":�o�b��i��n�`$���AA�����O���c���.�OaUQv�)s�3��o���o"�y�������y?-�'�*}����B�(8�:�Lpg��Q������?����w��A����v���v6�s��#�����/j�����f�j�������#���RGy��-K�����)t��h�xxZ�D�$g� ���	��6��R��H���B���:��s����L�vfP6��1Eqr���"�8c<&|�6�b	�+*~��E�&���ei'���R�p�| !��h��t�
��I�(��j

����MDo�0�����'������%�:-]-,M��'��&?d��BS��gm����$J�y<
����`*/0��%�y�
����2X\�2)��5A������[��-�I�8pn�#�~���F�:Zw�y�xOu�����1q�qJ�zm����y
�!�B+�
'a&R5��!F�1�-A�O���8������A<o"	c��'z�]�2Za*V����Y�S(���G7��^L�S*��/�Y4���P�^�|�U�X�x�?�&}]�Y����\_�Y��m���::*l"�So8���v;�;���$K�rIhTn3�S��= S����K~�OWW��������!����5��.~>�,�����6�/5���1_
�y��v���jy���|Nk�nq����
:����N��*Z� ����n9��d9���\E-�-�����_����u�|{�~G�����_����{��N���/���
Xs��d�/������]��_���]��A��Fa8�z��d��z���G�������G��c?q�p^3o�:�'�?��#��0O��q����`�1d��������q�.���&�6g���8�
��3P�{]�?��2����t����w;������_���^������f4)��M&AS��CvI��/r�	��*�
��D���g"��ZFH��V�uS�"�o��kR��$��t�W-�������n ��kL��|���I��������.����}���d���	5��k�i�Z�F��������"�-��n�?��s�y���
��3����������!k�>-�Sc�-��`�&X:_X��?�7��l0�,�<���Z�(���T3���p0�
����E���,GP8P�6E	��G�y^J������>5tB[jH�Q	=5������;.bhh�1��Tf�5�������v���[�>��W��s�Q�G=�v�A��|�9�����I��J�@{c;:����2������|}=���i����0��D��[v��Uk�u|V���8&�"��3����r�c��$�h���3 ��������WdfRW�xB�#g+��r���!�/���L$�Gt�V�(�r��vA[�����$��H2�#�G��h�e�d���s
��
�27r�P�~����I��E2�
��dT�:Hz�� �i��+H��������
4b �*���M�\��(?<x������pt��� ���H�����2��z���g�!"s�����
��1J�������!t��{`8a��BX�����z���M�$I���LT�?r�q�O�Y/1�M�Y���.J�'�G�hT�������g���C�d:F���E���v�K����t�*=f'����p��a�����d�+��OT�!=%}L�M!��Tfx���"%Y�l��DI� N����f"]0^p��h�Ht���_`���<t����y����'c������%��,h���)�(5����XZ�,01/�i�y.4*��c$&��P,���E����2'�7�
�I/��`,D��"����6��I�DR�-�f���M�N�ED���{�+S
���Pfb��������
��rM$��I�t�0�W0w<��z�h%)�������K�����[~�bd]&g�����7]x�5i��`�Y���	���R/q���q�dT�Cb�f+)?��`�v�F���u$�'�#�����qS��G+a!U: ��������\�j	���;KT��g���2�p����g��X�[i|�$p5�K�=��|�y����������<��"���@����tEaG��(����#{���X���T�`K7�����t�q
�P���(Y���/�`Acm*]r
��pa�5����l!�v:������YJ���yt�/E�EBL��X�N��9�@9��bZ�Qt�-F�Z����"/f/�A��&B��@�������9�W�r^egd���LN!W@�$��"ZCF��9�@R���D3\R�DV~�L K���*�jhr;	,���J��0�5i�M����%qnU��"x�P���<�K��������@�j��pmZ�R�_����f�U��p%��_[�V���(�%+���N������Ui�h�5��@��@����m"D�����U��l�����+;���TBNs�&�N��X���[��IK������_K+Le����\������i������`�����������s���ng:�����(�G�A'�N&~�������t�'����?�?MQ�QQ��AJ�$ih}��m>���TO���_���^�a����d�qa����7����������*:{������u
������+&�CKz����~7mK��),���3��rW��r��,�b��v���}j�&=�Y�?�.u�m��&��
�����b
e`��Jo�T��a
����o�Vkp��gX������P�����g'��L��<2X�H~���6j��E@����G���oNH��S��M�����8��1D�*v�NVr�8����,���)��"gw�MM_|8������s�[]@���+?X��k�1�g�
��j?9��?����gu�w�C�_{Z�w��w�z�������ON�������Q���'�}��}�3�_~���/��{�7~����b�#���z<������o�K�rz�����������]���r���u����(�~����M���~R�����q���9!�g3�����o���'K!�T3X
N���r�v5��R5�th��{�@%B�T�'�f(`��N�N���h�e�A}qU���O��~�jR��]���U���BR���P�K�&��;��U�P��7Y�/-K4���|���O�m~;VK�^m��3��"81�-�����Z����aw�g�>��:,U:�|��sn��h���"8�0Nf/�����L�W3(�$;��G^���'�����~�3��4�����tU��g������l��g�-?������u~! ��>�_���pY�}�Q��������W������_�O��n���������;�����7���Q?��/^7��������~}z_/�����o��+����s�������y��K�<;r��A?.
��xAg����3nsZ�����OK+���t��sP�@k�!M����'���\KO2=|1��T�
����T�L�AA��:��M�">O���Gz��@/�	#{���r �R�s��g��@��Q.�����^=��^.���z��x�w�Kv�+�^�^������������W~7<�u�D����KH8���H�~�h��Z`>	���m�\�e�M��)�m@��[4�����Po�)Ha��o����%��o�����n��u7(�)|�G��)\�P>e{��M����M'���7�d� k�tR�;�,������`������*��}��XnV���������/<��x���3I�8��`)���S4|���Y�Lll
yFi/�[g��P�>������V��^�Xp������3�|C��u�>?��-�
�c��}���*!�\����Zd�-���[^D�60��<PE*�{���P������P�nq�s�*t�����	[�A6]��(����6���������n����n�C�����^[��(��m��=��
��������~��N���+����Q�-u�N�t����B.#�T����*�0k�5���A��&�x����,�l�H��E��7Gl��{�&�tY�H$~C�����a��1����������q��A�������������Q��V1���24yXb|��l��@��F���C/+QH>r�`��'��.�N��lX��(TF�Wj^^�y���2�R�������&B�
�Y��4 �5��`��M_]�n0��E8�oL_G���(�������j)�x��[�so�
wr���Q�V����+�\�^eM�.<6�O������
+�1N�#v�r'+�qMi6�/��B+���}
�v�������M�p����B���7o~~��	*P"5���Hk�f�����u}�o����t6S�&�h�9���0��Si
X�����jU$7�;2�9<�����E*�
�+�����)�gXJ$��)�s�#���E^#{q�!��!��G�MQ��3�	��f����|0�bvS��8�0<6��Ts�K�qy���w
��de��p�����>�����QK����F�Z�B�P�?��S3k2e������
��4
��CD�:�����������e���Q���!�A1���X��(fqE���4:s`�����q5w�y,�O~����Tt�{�A�W(�����)������'���B���1����>��%�
��x����������|1^�=�_�W������7�+��W��J����+�x�^��?==��+�?���N�
������I�����E=�3�D����g"�_�N���6��X�BH�����}��V�5�![R;��������(w`W1��+�G�
�9v-&\x��9�����A����h#������+�|7{s.w���h|�������b��]��W����!f�1�b��D��@;�uX�_�Y3)�k�L���,��v6@����Bm���C���|�9������f/��/I��1,����~'���������w< {��r~@�{���
���:�����q����I���N������[�I�f������_y���7n~\C��(��/���:����G^gk����m��e0��wW�H�xEF�T$V��W&\���h���,AK�o~hV*�M��������������l@UmS���e�m�c���`m*��=V2��X��g���My��Y��U�������4�R�@Ho,q��|���.����2������}��������������?�xv���7o������AE/s����1������3�H���p��)+9���I2�\+V��-���O��39R�u>�5�pp,�� ��$#��->F:x�Yw���9>=+����P�`��<��
���� �E��T��a��Q.>E��&���Q��������\v����$���0F;���b���������`7{��<w���4������������s.e�l+����A���M��Agum��e����VU������������6#^���K�L��(X	����K�:��N�j#1�/n%��E�^�:�xk����6����@l�ym�x����2���o�y
\V�{�A%Q|W����=;J����=����<�I�?T��M(}a�&�&�����HIn%�$0��_�`{x�`F��l�a>���?6%���
bNx����l���9�
v�O#��/��\�����9���F�:��������}�����{Ss�{�X�iQ�����\w�Kd;�����	_)�Y���Xo�dLs]��8
�^���c���K��.R#}����4��p;��s�g������f��*��_���S���}�T��]���q��F��9,?2'Q��t~�{��t������'��,�r[�!�_��q�����	�p�:e��xf�����Jg��gk���A�X��A���G��"�����~��)���������2z:}N�H*�=�*�������L��P���W�EI��#�������k%��Kz�j����U��l�XAhI��M��rv������'��P�)O���5�Of��O�"�8$�� Z���S�$#oexs���D��>����l�C��t��W����-,o�pLX��%���V
t$	x�!��D�EIbR�pmz�]��>�]K!�?.��`0�gi`�6u3}�������&*�Q8F��h�B�9Z��$�[DY����b��I?c;~����W�W�o-`���^i��V����;���W�q��t^@�
�t*��B�)�Z`�[����<������aI��)]����+�Ac���oW�Y��7���o����R%�eL?�$���,'��T�g��#���&sG���{���y
���#]�-� �RR�q���F��1$��������*������
p�3���������+�sG�KM-]Y�;����cz���K���3�����g���@�}�e@SH�%��@��d��z��k
�
_���'�o&�d
fN�p�23�B����qU)�����[������UW�o�����
�l���ZI��J���d�����Vh�<�/����.\J��EU�~*��^���#�eG�K�X�3p��
):P��kU�`��������Gx����OE~���T�@T�N/vDJ�������D)k�p��x�6��9�R@�BSJ�T�Tw}��=�s���v6�F��O����jp�ts��������BXH���I\:�n��O}`}��������*��U��4�_e�,N�J��U���<�j~��&�2�</��\������$3�V�E�������>=c�s�q;����\�#8�GM����Q�w���Q�8*��%�k�9cc��%g�dv���9��(�����2���o����������fW��v���1��q�O7{�}X���OvV�||vv��GmTs�N���q���6�5��W���2X�Q�����Og��J�\-v�k^	`���K�@B�T������^�'�z��2n�����]�C�z����O��`�89 ��}���ep{�BQ>�H����t��j�'3>������
A������9�_����h���P
��}�����|�S�t5�$�����h�1\r�o,��	Ks���e0�s��Q�oQK������L�B7�6q2��jR�M-;7�|f"v���i��V�-�_np0��9���%c��`��Lw�S�T��H�*R[�������%��R���.����&p���C���H
���-�zw�";�W���-^�[�����o����*K���$�N���&����H���ZE������s�lH.�q���)!��O�K��*�zKB]�v*����\�o���%��`�s���]H`K��M7��.��o��R����?�P
e�@���2��Ga�]�6�^�,%Fn/6n�&����w|YJ�j��R�L�OG��8{m�;�7oA/w=i��\�I�	�VP��#�;=�B������]Oz[(#6����D���e�T�_	�&���P��i�4m��M����v���I��
����%���JAFIY���F_G��h�fo_+�xI
�V(�iO�UX��:V(1u���W����o�����)��L1/y�����fs(�n���.
z��k&Me������gwl��$��4�D�w�g]�����|	-�G�3�����z���O=v��^�3v.&����l��w*�6�4��2�$�>g���q7qq��4�<)�&)�n��%�|e~fe~fe~���[�l���3�f��[���[�Yw'g���Y7sf�Jg����$�����+�\A;D?E���O�O��Af�^���g�k}f�ky�[�Z���f��=?-��yj_!I�-�=1VS��������t~Cv���py~A���w��Mg���
���8�c�Wx�G��$��vq�Z_������t�6_��Y�.>z~e+
�]���]+1:N(�E,=�����T�W^�l�_���q�%����6�lg��}kG��YO9mv��|%&�?�(������m6��o�Py7����E����`�����q�O.J.��7��r�����;�H�5��iV�4���KX�JO����n��%Wv�S���!�����U|������j���|w�Wkd�X�C~a�*:&���	��&[^Es}�1�%����o�,|���r�Z��V�\+]�v�����V\�`Sb�v�o������rW\��V�b+��v�����V��3�4;7���i������T0�qr6[\�C�A(���d������1��1����� ��J��D�(�ss9��=�����p�o�x��M�����L]�G��@N��G�115P�&��<����n���]��������71
�)�D�B�fuS�B�UY��/Lm!N�7_���O����r��RC����BW�h<�
��-�����]�}��e�jS������-�����Oz������g�m�������6S��NJ�{�?S��[gj��9YM��8��fippXg���ZB��q����8���D'~~NnG:�.�/l��i�(�K�W��d����,J���D[�NLO�U���#�[�6!�(2D�~���������%9l����;�v���1P����5�B�?�"�#~Q�_���s�sV4�X��"�&+�gm����3
e����~��9�7K��6)����[3��v&���5������/;�_r�T�@+�4�NO���v��������e��V���������|e���E-?�����Q��������S
������mr���s�8r��|r��8����?,W��H��"�m�G����"����	�ES���^�Q����)3~8q(�oyN��Q����9T��o��KOk����Z����F�_�a$���d=��D��l1	?�`r��������9:}��^��o�w��F��������
�C��/��<A����!���f������s����y��k��\P"0X\���b���e�iQ5�W��l��5�/L#���c��q��N�p�����T�Mh6	�$n�K��z�^�P�0�������2��������`Zk5�-������ ���U�#��[��KH�;�%�g���\c�|�mb�����8�\�Ts�{	�!.�D��?OzG��v�����Q��q@[���a���������.��l�|��@>H���Dl;)����[���:?�����`[��ym�����U4	�.���nw�0�vG�����T`w�q��`�;���f����kz���������N�t�v}:��v?�D��5�A�l(bL�B�)���[6	��z�e2���D(x��.h)W�u���:����	q@��a�11E���-R������7����������K�b��4�o3������=1^_|��0o�5#���b�#���z<��^����n*��k��e��
K��{����1�I����/W�g�F:L���r�v7��7~IeG���&��m�=��x��p�Q�%�U��xH
Z��R	���i��6Y����Q;z~�KN��q8���x�`�9���E�j!�/=-xa����}����URz�Ye�9�����nK"����T�����\���6�������
�a��l��:vy���6.5O�#��o�`��2���������m����;��;|��Ww_��(2&�[Jn��L�i����`��Fi����M��6�����������7s�����|C��������5��J���������+����'����O�_h�G54��F\D����/�W)U��'�et���v�	��S�"��mF�=�G+�*�"��JE���,�����
��n��3��S\y�J�O�i���������m�ct��i
����3P{F�;�5�/+8����_#%�II�Uf�;d���V�g��:\�����m�.�9%Tw��7)�)�Z�+��9P�x�[�cD�x�n������o�[�s�,���-��[����e����.d�?$�����x]/+�{)�/���\�_��`�86�8�7lj�������G^���d�������;qKU+A��&�{
~h1�j�3���S���XV��Sm�>����O�0����9�sa�K�$���4oN^�|x���g�~~����*
u����zM�co��O�|�G�`C�
b�~���'rOG�����	l�%'
GA����)S[$S��~�8ap�G����[�{p����=����'e�^D���m�h�r�I����u$��a�
$�
�,8uQK�:0��u����iL��������o9��\q�q6��F�:	����D������[F��3����������������*������+|�3c��)o��~�E���h���8��@[���5����Z��4w���0sru��E�z�]F���h�B�9Z�8������,t T�8�`��z��#�����y��������P��,��D��^E�qQ"m�r*8��.=d��������b�M�tM����8��|4Yn(]	_������������w7����\��
�o�(U����������HTr'�IJ�����E�|�<I�>^��/cKE�brj#�����B,'����W�s�X���*(��)@9+�(��"�%���=����|'h�D�s|���YG9��)�K\a�~��\����m�����)D��A��sr��A�lk+A��uS�n���y��+7���|
�fns( ���1���>��fO����oo@V�R�����?j���t�a��
�;�f�
q�R��=��p��y7^�����W���O��`�Z���$5�)i���j��[��NK��������_���+���_u��K��'Iqx��W������<��|����)�Q9)�����{H,�'O,��ZNw��rw�\��^�O�`�~��gN:tO���Y�����a���j���|�-���h}����:��n]��C|��tr�ZE��u|�JHh��Ek|9�OZ`Xn_4���`��oo����y<���x]�����/�n�t��:y��g��%c��0�,���~�J?� �d�����7�M��?G�p4����h0���Nwv{�#��	��5����y���<^
�y���{�^\��!����=�?�-\�����C�d�GW�k/9�<ag����.�������Oz�O:}�������z�?�x�������?�����fg��4��@S��c!h&�15�.j�j��Xg'|��"R�/��h��p�Y�v
S����j9�o??_�I��iX�����X��q�g3�3!��XO������2�����^�A�����_��3��`��������#���b���?���������x�Y����!���%����z��
g�>��zf.S{g�9�d1Hq3������?Z�z��
��&�<�����L����*v������=����nw��v��-��L#;A���a�o���o��C~�����n��������=����L��G+��?|���p��^/W����9U�2��X1���l���r'G�,(�y���p�o^�{�$�B[�i�����[��������&���������p��u�/����\^�U=y�,o>�;�O7;_��~���	l��u�N��r'�x+N{��#_����<�<����w���)�;�����
����FNg�����JO1%W�J��b1�G�!��������JC��G5����i���fy*���
!_�fjg�U��+�da��������')J��e�2eZz���k��Z�~N.��O���5���v�'������,���Qd���bi��4��T{z�$3�� ��'L�8w�z=���m�)V������������wPz%">l^N�tBYY�O������'~��_<g���w��]}R������h���\ �*(����U����?�����R�	�*����E�4d�I���NO�����������%��a��I�w�]����������D��:�>��D��Hs���U!��Om����L?�\�
���d2�4��a��+���X��U��i�N����@G	���vj\ek�S�{^m{�Kl��.����n����s���;�j���S[�W�j[���V�;��.��x�}����jw���[�u����
\i�%]4H�A��g�8���)Y	��]$+�h!��,Y�%E����
;�]�L���l�([N������B�&�+����T�B�����,�m?!�u�H��~B�M'�� ���-���������(*l��������e���POK1R�&���l%��a����\�������"5N��"��������}��Ese��Av��Q��}�I��"��
��'�n ���D����_���:��Y�^L#�xE+�y�>�I���g��\�Z����u����\�h�Z��#���4��-�I\����j	fVc�k�=f?�s>;EH+������=���F#�L��O����cx�!Z�+-;aLg%���gu�+�K�J���efLzw�~3V���8p817����5A�K8�m}�m���@��s���B���&7,�=S��id�M�����������M<{8�����!U		���
�����)�3��}�MY��������|8�p��.�6yZ�/�e*��)�������������Z��p���a�_�&��sdg����a������������8%z�L��JpD	K���I���poO�'�|�X��r.a�E4��q7;�����vSl����sb�[\��U&8_�Y�5��������������O���4��ru�����C���b��SF��X�f�� '��);;h�;�[�����������x��2(#�X��t$Pc����3��?��
�p��Y:`�7U�Qf��Q|�e�g�l[��j_/g��')�����"�{�!	7���'����p ��c�����\
j�t����������q<��RH�:��3`��LE���`�b����J�g0�En;2j2���\�MI	H35��Xl[����F'|�?*hf�~KWu��e^������Oo0�+
��/��1�����R{S���9�|��c�?�����#6I���-gcI�	L�.r`��GWs���-�IZ�*'�f��V�1��.;�U�����qb����}���>������I���4�n����4���d�Q(��3�a�)h{��c7��m)cu?��G��`{i7�����6�f��3��X���]��i�T/M�@;S�KM<E��)����=q*$4`C��b%�&�5��nf�������W��#�����+w�o�X�l'��1�_-��2�8�N[��=�
�v�+>].��q8i�'H��0A���)��+17
�����X�>������8���d�d��yC�u���h��+��ON2k�b7����������n�R�I��$j����wqI�_et����f+�@Ph�
N�9x�	G�B����DKH�N���z@���H&�Qy�
$�V���P��T;:�,yM��YL����*����I�^��-l���|g�����
�g���5<".e�#:���I�����s�r,����F	c�e
i�
��{��Fp�/uS/����[����O�����a�t�f���V��������Rj�K��t����a���t��a��"���"���������AW�|��)'��5�9�#a�UziaM����� C��
\�X������C��t="�2�6�c��������h���0�\R��D��e��*%������[[[q������v�N�����ZG�9Yk{�.z�Et�xw����9E�W��r������V�}��s���CbIpZ(�H+D#��>-�=/l ;�k�?����������~���_�'����?x�/K���|��|���.���k��z�'Q�G��Q��B��u����w4	�CS�(;	e[��7lz�0���a��y���P`�8��z���:�f�1�P��q���D���2�����c���I���f��ZC����	���pEI;g���R�
���8���[J8��)���
��<��[�N,0\MX�#���C�Bi������QM0��~��i��i�����k�����j w�������?}w������) i=���bmR������swR`���e����S���1	�������}�����v���pT�&&^��)�Tx���oT�$J?jh�z�6[p��N^���~mA���������!&�'o����f&���^������k�f�@���p�d�R�k�_y���VPeh_���p�����"����l4�E�gq^�����5S�W����\�.���>��u:G�;��:�Ly��CuL��������6����5�d*�H��R������I�;YA��1�����1��xL�#��'���k��he?��9�R�C�T~y����_2G�mt2z�@��2�q�w�����8�t���	���C������#��%}(/_���~��,{S���a�+����r�t������a�pRt>���x���N�����s����|,�5���]������K��W�����t�zFH�K��n;KvE�c���K��<w�����>'�BY\��3#�(�g��8
C���c�� ���M������}������{4C�)>�����z����,��/���]�������?�]3�,���Z��d��,/+�@G���Z�����|��Xg���b
I�t���t2���W�q����q�J����
��K�bK�������������ao0x��x������a8��������F/M���`���~g��S�G����?r��B�GM��m�<����k�_�bs�!��E�����+��oW�n�1���^.�5iE�f�2X�/�H�@�S��NU}��B7vm)�G���-@��A��TVK�����9�/H,�|=��g+�a�)�1gyI�k����;�o���h��j��l���O���"�c�s�����!���������h1������L��tY�1o�������t�I�G��N�8����{5�4���~���:�I�{�j4o�
��Cht����#���P����		w3[zoR_Q-�-��5��E����
��9�4�s����)��R�x/]/�q���9�&�1��oD��h-g�$3R���K��"���������:ra�f
�y)�EG�6��!j�����!�
�d��"st>�It���_y������	����Sa,�g�1<����?�6=��"��|�K;�C�JK�� ��2n�[c��P��&�x7I�{"��w���T���kz�����d���Q��x8U�A�O1�2O����-��^�G6?kE@!����L8����YL�m�kS���j�R&���a�3�6Q����sAaX�"��2��o�'eY��=�f\��A�\��h{�H������EZe�=�XZ1;	zr���w�f���$6�����RgR����8sB|�%����$3b����[7�e/# �E$<�����msbB��M�]&�`*&���&Z�����K�'��	�7���h�I5=L�0(�h~=�"V��������z��-�Uli����kP����=�:������A����=o0
8W::�]���Q�O��������~0	G���'�o��?|�����?�74��g�E�	'5Mh����>6oDl�CV;��A6�u���U�P.��LWM�<�d=���*p
8>v�b�q�F'��p,qD/�D7#p��s��1�/�����%�S(p�7
�Di�D��w-^���Rl�p���@�eW�w�6���
WTX���O��d����h�eO�5�Z�<D
�#�����0����JUMe�(��U����m��
�	��2�2�a�2U�X0]��9����zW���
�����w�����w��oP#�j�S{�cO`Qd�XAZ�f�34��n#������&@.��b]h{���4$'t-iS,����, P/�F��������}d
����c�,������BE���n�B��������Z�!H���M��v<��.��?��L���-���o�z�W�l��(���:�5�[y?t���20[���F�l1Cp�g��X�4
��9���U_�����z�I4F)>���CT�tv��[�\AV9����6>��������q��L���lZ�
Ql��~�~����i�c�{��� ���EA~c�A�UV%�=Z�f���j�.�c��J2�����n2F��yN�x����	�D|��-uI�L�R(�_�j�_G������-�T���;|r��8����f�M��K4Jf�/&���Pg-V��Q~�����T��A/���)��we��I(����t<�@@b���������l��I?2� ��)j���?�wK�q�(f�L�}���B���y�X_�����wZ���n�W[��l�lr~I>�,�a�:�T���N�1��o8���9����K�`��}��I�y.��}�Ws�^�����l�� ����`��s�m�;<4Z������W_d���������S��@��{�}1���gr�������dA\g@�!�E
��(fk��PL�����0�VT�����l�(0X]5�`�DQN����/�86�S������������?�'�j((N1uMa��!�N�6'���	�w ��ct�
"8^e�I���=���u=[�F��h��0��+[m����n��b1������-���x�3�����u�d�4F���OhO$��Y���t��l;���tL��b���y� 7c���?=W��z�E���-G�.X	��g\v�Y��������6�T{����ME�gKM���u(
��������Md���C:2�X��0�c.v�g��e�hU����`��E^���8�a�F`���+������W��C��`���-1%Z�^\���z����5��Z�x����I��L:��2 ��4��/�#d/
��tUk���PJPU��2�:6�	��R*��)�P{���V:e��e�� ����������L��V^Y+�\k��n!V���0�4�p��m�7�_@��o�}iG��	��*_.f����8
��g��L2��(����B���Q�N��O[���9V�v(}TO�[T4��"���i;���r���/�D���|�)h9�����|�1du����` ���o�fc�	G������	���I���*��?�s�<@�E:�}�E#����"����4���@-��X�89WD�A�����j�_�h��l�ir8�#!��G�9*�rms�2���i�~������R2��9�l'��7�A��L{��_�z�G�uIH�Ez�I�5H���O�!G�9���n���T��e8^/���'�W��]7�����eg��#�8���#���v��i������|���J�rN��qf��������g�T����,|�o`���c�&�<x^�'7����lJ1�oU��)��B���d�'iJ���E�������3��s8�#/"��P�s��CAj�����6q�J\���T��D��c��$��t;Y�v��7��uv{�'�	n�J�s�� .h,��������:�W-���{��_@�����������y"��	��dI�K���dB�o��E�n��GD��7�Zl��1�����������Yn���Id����VEf=�l�re��6��S-�����6�J�?�X�����%� ����?��Y2�����>���.��^��g�
����C8&�	Y�`����|��[����������C�X�����3���<!���\K6��f�+��|���96�n0���_�z8x���lNb?H= 	��-�*� %����;t��K�$#*��|.��w��������Qu�� ��H	����/R}Ay������t��9i'����;��vo8��&ai>@����U�#���Mh0�=����
s����2�g�8��A-�	��s�i�HI��ts�P����_O������Ob�� �F�n�-2,�Wbo0�H����Wa-W��
P(�MdG6�+�|���:�����B����.���o���������9?{���l�r�����	�
����Q�oZ����B�I���h����c�W�p��5V���I=��`��Pg�t�34��r��x$$$J#1�o������	%�y�?�
3�7�`R<'Z��oBM,��$�5��7B8�P��B0�4��p���B%KT7��3����'n����'�$�s<��oF��4����;�1��lLe������
4�:��~2�~�K���y0=�&���
���A���_G�R2U�5��4gze[G0��Z�:���	����h�{�%_%���9)�HF�1�X�l+�)Q�8�I�D�*��%��(�D�j%������:�_�������U����C��\ ?��*�_	s��r~�9(*/.�`��#��#��[�������'�x�C���d���������X�-g�&��	Yb9�4@o�5��?D8�v��0�q6�E�c|�5A���J=��9��;�=���hXp�%�y�:s��J�	,���j��R�
�F��I���s��+}�m*�I�BX��3���C9��nA��}�,Y=w�E�*��h���s�������Fw�!�f��&H4�g@9W~�����"�[N
�F�d."���������
�R/��E�t��C=Io!QI�9���K����MA����(�M�A�9�4�e�{���^��HioT�kc5aK�lKDu�Z	�"|�(;(J�J{���f����	Fx`��`\��9W��&^���,}pg ��j*WS�G%k�
�jIC����X����U�?����;j�6�;���MtG
����V������MV��l�)�Vo
�4��PI�T��6W��X��+Y�$)�"�#�
<�O��A��A=>�@"/���=�v�i>���`�w�+���QU9RW��[Q�%�	@�C�����Q��Y�I�S�7N*P�����-&"��X8x�q��2���+��������y��o
I�BQEG/����@���H�K���Z�(�AN�����[I��X�6,U �)�d��e[Y�H9�����K8^�[	����\��v��4�mv@��������\L�����TjCR�i6tG ��4����Gq$e�*@����6�p��\�t��K���Q�*�|�."�3�\ESFG�v���]��R�I����p5[r�U�6��P�
,e"�qki<���
89Q8����a� ��o�<���dq�z�����2
�]��YB�:+�T��4GrGd~��f����w�?�{T��dIe�.���if�����N�{��MwpA��&^qd4^�'�����\��+�����x0���J4E�B�V�)E���:^dQ
��Y������)��I=C����d`���u������	��G�~4 �
�.zG[��U��������B?��Y��b���������(t+�E�sp�r�(O���������(Z2�����h�����Irz@�������V������8��U����A#I�'�08I��!WF��<b$L���(�Er�@�������&I�Y���e(��d�3=?��L��oI/{�92����j;��Q��-�`W����+�"��''Q'�������k,����$d4�O�q�&qMi}2�hK[�q{���5������rTi{C�Z�b������9Qf��q~��KYY-��,��i����I&����/��5�haC&�0��:rQ3�@��keS�g\��d�e>�\��AL��_�<|����������?���}wv~�����>�����B*o}_�����ej6*\���$\���9���dBf��G1���n.!�VOH5[9pk���+�W�d46����E��x%qVV-P�m�j����x���-R�[�c@�/��#�I�����(����dv-s��1}d�c�x��l[b��r��^I�,�'a>�KIS,}+�����L��������#s��W/A��Q�("�(��T
�����p��*1`�����=g���h��2�DB��\]��}B���Y7�h�!�����������]�w��|�h�%f{��v��>Z"���buo�Ja��2zRzC�0(�_/g����C]�q6�#�p��������$���"?'@?�Hp��2Lj	D�"�4���:������H#��*�[:S+^2�a�5(/.�B
�������a��tM�=I��!(=�S����\mQt�I����D6�|�����
�1x+���Tz7~"b�K�h�0V,\�S�t��V>i"A���������;U�F��o-]Z�#CA�M���[{��\gw]�����O>u�]�s��p�'��Fv���O����S"�n��������To�Q��/�W~���
�K�9�,��A|ypb&(�Q����tm��T�HJ�)��;+hok�P��8��������8'e��&�J�DSG�^�p!-s�7j!(����@
`w��c�!�.k��{}��c'G�v�0�8=�-�_?�;fm�u�+���V��2,�0U����y�Q.�[��EG��Qde��G1���;$fJ�)��{�����. ������!���>��$����m��o��FQG���V�h��w�<M�����r�$/����'*ZM��iOcJ���5�sCV�5�%�c�)Jd	����o�������eU�j@����"���^��A�dW�8��oo80>1��������*q�
@�|�!������@h�AR�
zC�������BQ-`����g��,�<5��h�
<Q��� �1U��|D���C���a.!���
5��M�����_$���q�s+.�[�S%�@���$+�>������v������0j����u(F)S��X�z�E�C�S.��1�p@[d�kP���������ME�|�UXF8���@����Y��9�!��k��7�4/��x�v:�a���������L90$tJF��"E���1���O\�!!��o[�[����C�$E��F�o���/�;������I����8g���q3o������������*�FE�
�S
/$�Xh��f�D����N�D�"�����N��\\�p#�*T4��{oOh,`���K�����O�-������^�q��
2�b��`��nU���{N7\:L���	��c���XX��IUK�6}O����h��-����`���/[�����$�K6�����VN����O����)��ww����l�C�N�?l��]mwyf�p-�,���.g�!���J��6��(��sN�C���4�xT�;mf?oGw��m]����Ae*~�9[+a^�;h#���/{B��������/���=Q����u�e��U�.X��g�p�����=Q
�VJC�X�W:�����ceb��p7�{7��~6�D�����-�6�;�!��s%^���\Ck������A�h�u����<~��;����������?B�������>;��5z���|x}�&��\4���E��������	bI�J�BE���!��������-S��&�%!@������H�����#��,��N2��@%uW�h�!O$���"�!�J�?�L\/�0�M�swx�R6B<���YAP(����b�q�����R:�<��M�C��80T�������/����o��YM���
:�|��CD�="��]C�:�3�Q�+C��6B���J���2���H����tG_����.������)����qL����-���W��#(L���kv�>m��}�Q���'.�G���x)�.3Hkk�e+�DS�M�wFBG�U<�R�Fa�N��Q7�<�R�H����*��Z��6�T"<>��4��#����EdG�5C�7Z��F��z� p��a�Y�:'`�U���! m�^�C�&���F��#��`$����`����;
fb}�ze"m�������3���|�.E��$yjl�x��@�d�AK=��[D)(��1���kJ�t.r2��f~Zb�?��KN[9>`�C�&�>=��O?�����W���s�����V���P�I���Z~���')���T��
�2j![uG�]��T3�=�R@�|,	�����[�qbE��/����1�����R'����uW�	!���H�
bw�!X��lb�~���A9kE4���DI�R4:h=��'�9�]H������jv;��C�xB��:���������Z�4}�r\e�1C���Z�W������:]d�E7|�I�u�T�.�.�������"�ubHN����e��&�[k!�h�������b�Z_b�J��RE#D�/V�.jX�K��~���L��d�"<�1y��9���A����+���!���A��&'P�DW��(�n�O�A�����v��{����G�/�~|3�-�S"�(+\<DFH��38^��
Y*�*C"E@KE�09A��`a���`���.P���<(i�
8���������)l���tt:2��2^�����Rl�F���=��Tb{�����i�fpX�5�tW,o�q��.x,c��Vi
��;�U�/��d-H��gb��a	6D���*7r�U��8�UI>RI�$���^���@z����E�z���'��]&bV�o���w���U�2D�:
>Hn��-[!�9�?K[h�O3��$�b6Ek��aNc}P�8�C4��H�y�5T����/�X�X:`��?�����9��uuq���%~'W�<k4��fq9&/11�M��/5��:�K}���t����^���'������;L{'p��LOJ]��[�M��iz����+kmaE�*�D������~����Q�rH����[���+*J���X��(B�yT��S3!'��2���_�������bv�~��mOn/({ �����%��1��`�,;�h$��:0�Z�	�9B�\X�,���:�I����Z--�Q��@�1������G}�r��1�p�ygZiU��%s���@s��4�Y�K�u���:�&3���������})�DD�U�&{{6���uG5�Gn���4�:�AY�Ua�x'�|���]���1'2
P�#��+��X-g�5��������Y40�������W���9�o��Y�af��Bam���3�����bM4�w�$h�.&P���n��mWU������\N���rK�$��"��s��E����=��������9�e��Q�u��3X��N��<$Q���@���en�kL 
0/j\�_AL�U��������<`�sb�"�
2�c��5g�a���y��a�$�cY�_%�|0�qF�s3���)	��NS���
���r�q��-tyuf%5���l/!���&z���l�	��O�H��m���}�k�IK1���$W`6tP�T{}%	�c]��q0
�@'�E_P��hJ���c��Sk�@�d�&��%���]��t�-�#�t*m�T��^O�aX7�3*u��D�*o�B'�c9K��[@?B��T��deL���H���R"��o
:��j�ei%Q����3V��$T5"`CHY�Q��W�h�x�VI?�V�D����3��b�����/Ik��/�����4`Q4I��VF[�`@7$^�q��"�7�UD�G�q(�����89��!�87Z,:J�'+�����v@7)L"�C���	�>$�j��t{gd��6i���9Wd`���v	8��%Y+(��� �GAqj@�c��YgQ�t3����N���I������}����y�L\�
�Bghb]�1�& �-M>��H������R����dLBdm�MR�dcA��`{-���5�h����������T�sB�6���0W�������V���� �������Y0c�5��>>�!������V*rM)�)�L>��Xp�M�s�L�j��I��i�1;#
*���N����$���zN95C(F(��B���L	��N�j��p�s�Z�S�J��0�!a����7I��06�k��J��d�%m����4�����XO��t�`CF�LA�k:6�bw��"9���m��B=nG	�o���'P���o�}�s�hxdn-��0aX~)���Z�-+SbI�E��|b�_]C\!��nE{1�BRI�&�)M��K���n�L������-)-�������E�B���e���o
�A	�d������[l���Wux�[A�i��N���t-n�C����G��Y[i������Y<>�X�35��?�ZJ����;h�{��x���k)e��P�4G�#����E�$i!�7h����s�H|��^/���U���kp"������hSFG��7�~��4��^��|z�����������8?�����?��tr����D|�1����(��s� ��m�e_���pz����S�6�L��Yf����@}�z�R�c���}'�#��q�*w.VX�
b\"�K,�\t�6������q��n����5�������������@�1b.�:�`
��K�/���g"��!O�|��M@��zj���M9�go��c�f�@F�Ex	����pT`�w�_�'�������z@!��@i@.���~��
���n���Qd��� �g��~q|a\�*��S.�W���#(�"
?���=e�k�����o����\4{���������5���~
����>�����=aV��,5��5!��O�Z���*-,��z�7K��V��y}�,���y/Vm9�t���N�<O�u��j�.Xo:���i�M����������f��tF��(Zo�+��L�,Y�����@�0���
��JW�J�+�+��Y����r^�~�������:����F'�NHWd�Qz*R��{J;����;
��Jf��E��x=���8�o�)���`�T��0BYr��L7�/�;�X����UV�,;�=�/^��8q��yu�:*��r��$��a4���XR��3���4��s�5E(��`��$���^y���������6V�J?�c�p���C�9
b�]zr�j��l+�[����w�(MsQ�)NN���po�z��o����]����"n�o7�f� �	���/�Y4=�MH�9HT����z:��g`F��/��A��H��v���`)�#C��?��u��������?8
7������&8F��D��e������o���	��R�Q�6��MeE���Zq���Y~��U�>�C+�2�s��A��4��A���y���M�x���]������
���s��N�v,u��#�A�lCl������G.�L�s���duX�o������L(g���l����+����)�����9�~������&,��W����*xK�����X���9�����e�5u������h��zN�� 3p`	�Z�?>����G@� Yq���p����3�����6�4o�6�G%
������Q�xGa�9��M���)����~�5���!'s<��?�g����S��d#��8�^B��pr�9��
�	�T���h9!��!	2B�7v��H1�4�"8�01e�#�Oj�^�krZ��*l��jBj�T�B#��xDo��GOk-�|����M����4���hb���*�F6�sG-��7I��� �t5@�#e2
Kc���b	����$@��S\����9B
��4@A�>BZ%����E�j�2=����@*#2
�\P��.�����d@S#�� ���d"�������U
�,�,`�j��|���l��Pgie=K+���J�����z"pt4K��jf�Z�;-�kie�_�U� f����0�H9���h��4����O���#t()�`�)��iZ��5�X������������%�b��d���-+�����@|p} �X�*n�]sT�D�����:������FV��j^3 �^��S������^���j��������x�Y�)�t�X-oS��y�!������z����J��~��{��4�����A`��Kf5�}����Z�/��������y�<|>��������y�<|>��������y�<|�����dK�

#117

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#116)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 13, 2018 at 7:43 PM, Jeevan Chalke <
jeevan.chalke@enterprisedb.com> wrote:

Hi,

The patch-set is complete now. But still, there is a scope of some comment
improvements due to all these refactorings. I will work on it. Also, need
to update few documentations and indentations etc. Will post those changes
in next patch-set. But meanwhile, this patch-set is ready to review.

Attached complete patch-set.

Fixed all remaining documentation, indentation and comments.
Also incorporated few comments improvements provided off-list by Ashutosh
Bapat.

Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#118

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#117)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 14, 2018 at 7:51 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

On Tue, Mar 13, 2018 at 7:43 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Hi,

The patch-set is complete now. But still, there is a scope of some comment
improvements due to all these refactorings. I will work on it. Also, need to
update few documentations and indentations etc. Will post those changes in
next patch-set. But meanwhile, this patch-set is ready to review.

Attached complete patch-set.

Fixed all remaining documentation, indentation and comments.
Also incorporated few comments improvements provided off-list by Ashutosh
Bapat.

The patchset needs rebase. I have rebased those on the latest head
and made following changes.

Argument force_partial_agg is added after output arguments to
make_grouping_rels(). Moved it before output arguemnts to keep input and output
arguments separate.

Also moved the comment about creating partial aggregate paths before full
aggregate paths in make_grouping_rels() moved to
populate_grouping_rels_with_paths().

In create_grouping_paths() moved call to try_degenerate_grouping_paths() before
computing extra grouping information.

Some more comment changes in the attached patch set.

+         *
+         * With partitionwise aggregates, we may have childrel's pathlist empty
+         * if we are doing partial aggregation. Thus do this only if childrel's
+         * pathlist is not NIL.
          */
-        if (childrel->cheapest_total_path->param_info == NULL)
+        if (childrel->pathlist != NIL &&
+            childrel->cheapest_total_path->param_info == NULL)
             accumulate_append_subpath(childrel->cheapest_total_path,
                                       &subpaths, NULL);
I thought we got rid of this code. Why has it come back? May be the comment
should point to a function where this case happen.

In populate_grouping_rels_with_paths(), we need to pass extra to
extension hook create_upper_paths_hook similar to what
add_paths_to_joinrel() does.

Also, we aren't passing is_partial_agg to FDW hook, so it won't know
whether to compute partial or full aggregates. I think the check
5296 /* Partial aggregates are not supported. */
5297 if (extra->partial_partitionwise_grouping)
5298 return;
in add_foreign_grouping_paths() is wrong. It's checking whether the
children of the given relation will produce partial aggregates or not.
But it is supposed to check whether the given relation should produce
partial aggregates or not. I think we need to include is_partial_agg
in GroupPathExtraData so that it gets passed to FDWs. If we do so, we
need to add a comment in the prologue of GroupPathExtraData to
disambiguate usage of is_partial_agg and
partial_partitionwise_grouping.

In current create_grouping_paths() (without any of your patches
applied) we first create partial paths in partially grouped rel and
then add parallel path to grouped rel using those partial paths. Then
we hand over this to FDW and extension hooks, which may add partial
paths, which might throw away a partial path used to create a parallel
path in grouped rel causing a segfault. I think this bug exists since
we introduced parallel aggregation or upper relation refactoring
whichever happened later. Introduction of partially grouped rel has
just made it visible.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#119

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#118)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 15, 2018 at 3:38 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

The patchset needs rebase. I have rebased those on the latest head
and made following changes.

Argument force_partial_agg is added after output arguments to
make_grouping_rels(). Moved it before output arguemnts to keep input and
output
arguments separate.

Also moved the comment about creating partial aggregate paths before full
aggregate paths in make_grouping_rels() moved to
populate_grouping_rels_with_paths().

In create_grouping_paths() moved call to try_degenerate_grouping_paths()
before
computing extra grouping information.

Thanks for these changes.

Some more comment changes in the attached patch set.

+         *
+         * With partitionwise aggregates, we may have childrel's pathlist
empty
+         * if we are doing partial aggregation. Thus do this only if
childrel's
+         * pathlist is not NIL.
*/
-        if (childrel->cheapest_total_path->param_info == NULL)
+        if (childrel->pathlist != NIL &&
+            childrel->cheapest_total_path->param_info == NULL)
accumulate_append_subpath(childrel->cheapest_total_path,
&subpaths, NULL);
I thought we got rid of this code. Why has it come back? May be the comment
should point to a function where this case happen.

Oops. My mistake. I have added it back while working on your comments. And
at that time we were creating partially_grouped_rel unconditionally. I was
getting segfault here.

But now, we create partially_grouped_rel only when needed, thanks for your
refactoring work. Thus no need of this guard. Removed in attached patch set.

In populate_grouping_rels_with_paths(), we need to pass extra to
extension hook create_upper_paths_hook similar to what
add_paths_to_joinrel() does.

Hmm.. you are right. Done.

Also, we aren't passing is_partial_agg to FDW hook, so it won't know
whether to compute partial or full aggregates. I think the check
5296 /* Partial aggregates are not supported. */
5297 if (extra->partial_partitionwise_grouping)
5298 return;
in add_foreign_grouping_paths() is wrong. It's checking whether the
children of the given relation will produce partial aggregates or not.
But it is supposed to check whether the given relation should produce
partial aggregates or not. I think we need to include is_partial_agg
in GroupPathExtraData so that it gets passed to FDWs. If we do so, we
need to add a comment in the prologue of GroupPathExtraData to
disambiguate usage of is_partial_agg and
partial_partitionwise_grouping.

Sorry, I mis-interpreted this.
Reworked and added is_partial_aggregation in GroupPathExtraData.

In current create_grouping_paths() (without any of your patches
applied) we first create partial paths in partially grouped rel and
then add parallel path to grouped rel using those partial paths. Then
we hand over this to FDW and extension hooks, which may add partial
paths, which might throw away a partial path used to create a parallel
path in grouped rel causing a segfault. I think this bug exists since
we introduced parallel aggregation or upper relation refactoring
whichever happened later. Introduction of partially grouped rel has
just made it visible.

Yes. That's why I needed to create partitionwise aggregation paths before
we create any normal paths.
Yeah, but if this issue needs to be taken care, it should be a separate
patch.

However, as noticed by Ashutosh Bapat, after refactoring, we no more try to
create partitionwise paths, rather if possible we do create them. So
inlined with create_grouping_paths() I have renamed it to
create_partitionwise_grouping_paths() in attached patch-set.

Thanks

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

partition-wise-agg-v20.tar.gzapplication/gzip; name=partition-wise-agg-v20.tar.gzDownload

�Tv�Z�<is�F��*�����	��DI^{����ck$gRS[[,h��A��!Z���;��	P��xfk���&������VN�YG�:H�������������3���`r�7�����hx<OF�����`|��|#���.>y�9���"��5�������������}-}����vcO�Yl;�g�r��L���2������E�����q�O��Q����1\��A�s��L�����T��=��w��tG���7��''��p4N���T�G�F��`"��s�O���9�M4�N�A��b��\����2���2o�s��S����s�>�]1��
�>,2���������G}�&��"��\�����g������M��)�X�h
��#�,��u�d������,��-�&���q?��;�WY���V���H&=W|�S����K��R��\z]1�� JaQ8Vi���+��P��mY^�����A&����3�k�D��(����z��<9uG@S8f�1���z`����/�a��{":�5pNU��6<��T�"�f���N�I���h>Md��bX����$���%�G�bh*�t�9�#�^e�,�C�:�4������I��V����D����f7 o��U����h����2���D��1W[����H�n�|Gi&.��g����Y�f�8�%�.����z�
��)���8�����2c��cF�;
��c`��h�����i�R��P�agn����-a�G��G������,����O�&hst!����"KV���G�w.a��r���LfN��"Kr�2sP/(;�"v��x%3P
�YI`��5��������
���'"��j��O��a�f����X?��nu���������?�+�MM�����'��|��:��W0��84��-#�;��~{?#�:��Lv51G�A����=�u}��.<�/*9$<�0w�����10�O��]*���Kk	�T9)c����a�Xr2���L�$�ZQ����:���0����G�\tT�9A�DDX�)�g�Gta����V��b<�
,}0|Q�F�8�
V!!,p���Y(�8�:/��\�$���v��kP��8)c���2�J�Kw$��M$Jr����U!����V������>��
�7I��i=���1d�g�����{��[��YRK�c�Dy7�4o�o�X�0�zF��� �R2d��	�r�\�$c������\����2B�����.��9]�4�����������8N%)��V<����/3#����������7��v(�o�i�������6K%Z*��7�K������������>��I�����|(��A�}r&��������?�N�����l2�'��q'����������?��(E��:�l�{�UC@4U�N)\8��	�:^�[���8��EzG�qa��b�D�<��\/�����}��&@�V1_z�!Pq��s��)��8`x"����<E�H���m���A*��t�f��mn�IE���\X��nn0�8�o�=Do�oc��%Og�&i��+7s%���*�������9}��x����������S#U��	�K�xf��9����n�RW�G�����f*�yN�M]M]0���������� ��?T�������B�5�p��h���$���f��I���\�>��`v�N
���.�(c��ys�B�{�/_i�
�8�)��'E���$� ���K���#��U�	r���bj�s;��\���w^�O ��/��0������E� 
�`���^4>7O�P�N���g��e"#�kL.}�Y1P/H��	�VE�U����:�vQr�@�F�(9I��pz����!O�0�
�6����\�&-'����;�>.�g��r@�?��|��F��er��[�i={�����{r��:�p?Q��3��r�r�>4���U�a$D�Eq�S�wZ�N�����*�o�)#�F��*C6�o@��4���~���	�c�_w���3��{�S$�kPB%��a�0�3���4�!V������dy
��]�C��G�1T�t��������Y��Q��-F�m�`����t%~�;F�����~��R�<�xaN1���r�]�<��FLFL��XY{���/��2j���Zbo��E]�������*9�Z"�y����2��Q��B���E��4@��^���-)[i���E�\�r	���)�C���=a}f����v�K��*V-iz�7F����6�Qg{��A�YE2c�S��8S�'�J���c��]f�
�
����v��.�4����d��������������������?���d8r���������~8��N|���N��W����s,G������h6��69���!����[���Z����n!s�-�0�^���#���9�	e,�TVZ����u��j��iw����Jp�=R��*�X�����x���t5�!��V�,�^�
�w��?9j���(�>�|N~"��@��.R�Z4[�M�I�I]�*�vA�3��b�0�
�����he�������6`�`\()�Y
�!��r�����"=#�y��eY�K���9A�B�A��E�@����Y����:_7�KtkRg)�4L���?�H(�o�������q�'�E$%��"@���C�[p���5�����qV��J ?A;k�p��#��R��E9,�
#�����|"!�H��
DE��J�
��eDi,��E�\�+!�(�"���L�4_���ii�����N�<��!
��Y,#VD^�����"j�Q�Hy������&��8�/`0O�H�(a#H�Du�9���a���I���h����q/��4�i��[��$�����,���h5��V�ew X�O{���b `��2qi���m�-�n6N�+�z�hv6��_!w���D���F>%�='@��U�����v�����p���)8��
S��4��[vH�������P�e�EbP��%r�t��I��iY`�R���u�i�U�)����v'-(Py`{���l����z��/*"�Q��No(�=�4�l���H��W��3�o�6��������,c�{��X��S�h��n���;�]�
��j�x����uwt�by������hj$��HScg7��i��~HQ�I��=S5l���#Jn�_P5u|�=|���S3%������v��;�w=�j���� ;h��{������YG��/�����m�5��B�����u�T�^�xu=H��$����b@x?��U`���>����1�>k���W�q���������;��;8:��v��3t�+.�QR,���]�.���N���l8��@��'�d���Q����Q��A_81����c�L�v��2��W�J���r����B.��s7��^�h9�y�
`��^�����c�6U$�&�}1��Y�`��u	����T����&
�J��c��%P�pw�
�eo�����Y<S�<�-��;����<�`���x�������)�%p�be��c��^�����!���$�Z��/Vc PF����6&�q2��g����*����i���&��TdIp���M������U���Qn�L"Y(_����Y����������S>i�������B�M#x���r�!gM����V�C|��%?��#��3,L7=fyl1�GPH����vH������XBD�p��u'��%�g��� �h6����O�ODk��o���$WmS���x{��->}B��e�vy�}�P�~�E��^�YT���T�'�X�E�`������sk���i�b
�3���z�
�B."�b�3�Y�]���\P��p2I=�_P�.�g$���.�$^��'
���J|I:GQ�DD����I2m��z�Q�9>���2��Tz��V�H0�_���|��=>g�����k���Ta=R<�������6�������!�v�r�(��N�S���z�tB�v�yJr~�a������3'~�g��(�*��W&����Lzm!~�� t����08&�J�p�m�tIH{p>e�r�������;^45�J�G��T�������2��DF��+	��e[����"���.Qe��`,q�*��);'�����f�����`�d+}ZB��Q���01}�D�+CQ�������R�k�VK�����072k�� 7cN���sB��>�#Q�l�	��[��d%�R�|�������vo���e��$�F.<2DyG��^���=�F&*�1�1���n��k�ki�'���!���=
��U�%[�.6]����[Xhj��s�yi`0���c�#����������7���7�?^�y��l�?�%{��q�qUx��e8�!��G�����|�
.�g��r�p�0Q����[��Z��l��3��O��h�3�K����:�0���i�7��O"�<�t#+�ykvg�Jz��n*{�F4��j�*�Q�����5�%�$���!��2A��oH�u��h"+�F$���R��"�r��!G��L�G����IX�[�F��M�I��������J�q���7hU�eD����w?�YI�#�AX����5b ��@�g������pxSP��|�0��tEb�����E_"���nn��d�;������`�t�����r���?����wu(I�9j��W[��Q���Y�
���=�CY�Y�V��JU���FTk<�yu!�eQ�d���a5V�m4n������[����0f���&	�Z�4u��E���|���R�6Z���)�@�������\�BL��hs���I���7�R;w��~��}��@��^��L�1I��nE-�VM��
��]A�D�6:��Q��5RV}gR]b�������=�������~�o9�����)���|u(��l7<8
��u�?����C�mCj���Q�A$�*�<FGF3�#,z��f�Q4{,�g�u@�����V���(� ��2��X���ao�c�atE)��������n����)2���j
�Y����.��2���	�������/c�y�:��
����^\_�x3���~y�f��X�{����cr"{�����FB~b�LG�~1A�V������Z����b�~�����vJ�|>1n�8!C���p/�m2U�@x~v��z�
��uQc�D�#3�PO��T���I�|�$�!zL�@yM�vU�.}#�5�#-���&��	�:����,|G��4�W ��N����9-C!s��&h�_5�i�r*�r;������f����V' �iOG���&��4�!������ZfI�q��yz^�q���+����x��4�?8�:������6sU_�`*���
P�u��d�M�e%����I?l��U�Q���S��P��J)���>��k��)��]�L����z
G���]�H�^|�_���G��������������
3���wn�8�D6l��b	������\n
�k
C����E�����d�[�����y��F�N
wc�s��x6�����M�.��A0���pKi��!��(����bC��N�oh�B��Y}����d���'��C�.�53���2��{��k����A����c!�DWr�(8�d���G�
���??�Wp�H^|V��T����$���������? �/X_���+8���}��=��)���H�$Pih];�3_�E���3m���:��;�H-�N�*$���O:S��x+\�H *�P�������#�1�)I�����8�'wq��p�}������>�~k>"Q��/��%B�(!��a�L�����XG��^:��zaR���,� PM��33?Z��.?�6?�N3�����������R�������0�Y�'35�z���q��V�t���x�I���$bT�����[���%q�j�D�[�����|\�OR�4n���-D8�w�[�O�M���l��d%R�d��@oV2���?�A�
����/S��E$�MhI���@��isA�I*A�LPSH�~����W{r�hc������s/���7�n������+����)s��c�8�����=�2y��gHB�\������?u����E���NwdU�Zn�}aG��gfX+q~Z�%��x�o���g<hY�����{n5\�Z�<[v=��
)81c���b���{*=��c������h�5:0@d�f�x���}L!����),�������q�B%�R6�!��������br���~�@m+����v�cjF�Q�:������`D`{!J���~?/�u�vU_8��H&��u�0%m��>.�����6p��YFJ��p�S�O��`��]���6�h�5B��~��;��&��+�T��m���0e��yX���.�Z]te�T9��3��O�����g�M&���V_��V/�g����^cw���}��>{
�n���8��^�����8H�����U"���������(��������u�.����,�l&�-^-t*������������G?v|+.��M9T)S�;��������4�=�Z(�n����=i�x ��!�������^���q0F����8�cj}�906���\������qcn������\Q�r�D�������-�G���<2Y�i�����kw6r.R�e��^��r�V��q�Up���`:@%F����2K:@�{Tk��1K��4��p�6{�\�)H�s}�>�H������J< &�^dmJ��32����-�{oaRR}\���3r�5�L�<[��Y'��gN��$��8��W����,H�v��m�.�
O����g��(��t5�������:H,G��3�d�f���6	o#��?��~K�gn6h��/�?�����>��M�=qLB�����@6�-� �5���9)J��,r�QW�CL�\���=������d����� ����"gf,>"��e��?b];��@�22��2�:��J8���'>R!�r���R���X�(��K�z�(}l^S# "u+��(%+�Y�Oi�{��'��u���i?Y3>c�m�@`^;������-��UNE"�L��ea�����wBVkza}�mzLg�1c�0k2�����<c�k�IY��V��=��|�LN��)�������]��.�����p
��@-l��)�I������t�@g���~���)����6������j���������i&�)�Pg�� �����ko1Z��g�m�����F�(����`�������VE��kN��|�]��\S�k���5�-������������c��k�3U���W/P�<?{��M��q,��|�x����xB�_���yv���u���������5\>�����!�\]j�`$���w7�����~��K�R7�����$�4�}��k�qAoM�8�����kNipJ�)Ny6A� �]lBN�)�_����-�&�I	����&��H�f����P2I�"��FD^�W.�=�����@9	�J*vi�9��XE�|C5L}��f�#B���$�T3I�qmk���I�A��2|:o
'#a:_Ti��z��"���Q�%����|��,X�B��SH5�,��D���g��s8�!�����!���3��3����[�}���(�B{���s�g�������#��<��Mk)��� ��k�1�Rz�i��H;�$��S�n��g7��������t&��`�R`[NI��+d�v������$��zX��7�w�li��Y���I}����Z&�u�4b5v�4b0����Io��C����J��0�K���0j���Q;p>�P�$j��$������.�v�t&�z�t>���(Q���eb����]N��lR����w��)��2���#������~D������A��J������|��-y�J��e�2��������u���� 6�����o:��������!�p��[�!��Q,��3T)������"`�����m`�S��V+��'��^=8���T�1o�E%h����[=��)����x��>��.W��i�y�%X�Y���%���.��
�N����:�G-���nC�?j:��BvJ$q��5��v��k/8u�;���:W9��Y���\�0dQ��z(�1yO��8N������m"�����/%�A���~�K�L��5�Q����H�X���q:v�8��op��mvn(+�[�H��;><����{��y����T��^8n]ll?��N�nc5���Z�������P��8�-f��H51U��W�"����W�`;�z�}�m��d*���PU�F��'�U��x���
��Xc�\�h���*�9Tj�t��A��81H�>���uR/��?SAS��]�g��9��R�L�Vt���fh,��2,�2D��M��::�T����M��G�xc�;.Mk�����-n����c�S;j���q��BY_hA!{�#�7��l���"X���o�_��9��
*^�g�H����+.�����'S�K��{�j��-j���80ui�C���T�����:q�6=�^��v���Xl��������59�/md	od������!�����
��p���o�dA���W�>F�P������-��
3n�#}vF!�����H$������|�a���4"���z���(ee�p����5�^���I��mX=Kxx������B�[������oy}�Y��;���y�L���a��q!���G�a�?2�A�8�v�sXyT��/1�
$���;��[����P��(M1e��]d�T*�I���in6+!'��x�t��A�u`)������X�r�m�v3&!(�_�C~|r�����s�\�K����`G;!��N$�tBl���J�E����2i�YU0�(���d�����uV��f���XNB �p�|�P��rsu-�c?����?b�:���x.e2��*`h����h
%?n����beE6��P^!�W��/�n<Tx��Cw�;8���X������
��x���;��*�
�[�������gI_�����qf���6�v�&3��tt�����(}�*���)N��*��Vyu�V1yR�K/�'�/K���s�9^a��AF�+t�;f�����*��:�����]��N�����c�H��u���[���?����&���?�UU��1L���
�z8���2�D�JM�����y?#9+�*
�J�`8#S��s��r��z�����<����EQ��Dqo8��{�e��M����4�9�������"�(��r�����~�>|(Jm�?�9u���L���(c��hEQ�jQ�7���Hggi$]Q0e��	yB��4�H������H[�d��9�+�x���?���TF���J�d�
yC��<-��M�=��"�,4��)��O4���X0NjJ	�\���@N!w�+|���ve�$,����R{���4�c�H�����n:����:�����i Dy?����;��c�A��f�!���0��nI�=
D�	P� (h��3.�'��}�����<��������d	2�,AJA�A-e�RG�G���C�����U)�! ��`8�K�����2�affy�%^��t����_G��XVP.<��7�2�j|}�\���39)���!�`���[����HC��F�=�i���xA�=���N)�VJ��'?��'�x?;X���#���<�M�!�Q�	}��U��CZ�����.�*���8��%�d������p�4~"4'��z>[~����Dz��H�1�E]��EV"����#����
������%�k�}�������D����WuT�D��N����f����I����In3 ��A�?�B���������B�$���*t�}��Z����s�9��!�(��P�|����J�4���R��?����
 �
~'���I��
o��h�^���L�/����!�����d�j����o��Z�^~�n��
��R�/A;���~��o���n�A�/��)6`�������w���vE���?���{4�N�������t�����O���v�������(f?q�x����Z��?�#��0��q����h�.f���5������nV�&�&g�W�����m�?��X>�Z\�ku����fX���7O.������1	�h)A�I���da")g���c�~TE���(0!,72��U��o�fBK�rE'����E��0��6�*�b�AO�g�/N�F8�  �����z�����Z�Av��Q�L�c���"[��4R}�f�����c�Q4~CT�
�����W'�M��e���������\?
���N[��E�g������A��j���z}Nbg1=hW���pB�R�E��c.N��~�5�R���4U,l��q��r1�)%��:T[��R5��x�G���
���6�w\�����Sy�,#-���S�[�p;��wF����%��!o�uJ�QT��A	N������G�N�JyA{�::��������W�x���p�P�L]�eOG"�-;��q��M|V���86�"��0_����j�cE������Y������oM����+rv
�C��Pf��Z$3�`RsH��4'���b�1
�\3��0�DT����#����^�����n�u��)�f�*Hr�
�B�P�d1��
%����IF���D��b�F1��p`�C�`�FT���-(�T���d?,W�[����F�����U=�tY�&�F��2��r}+�2%f��R$��Q_������.���]p��N$��mc����@=}���':i7�����L>� .��1%���5����I��*���@[%������������(����-w���������z�;�Jk���1��B2\2Lk�`�$Y���n�U���EIS�S$�#��F0)�H�:���?Jt�0�o6�L$��.��������v���zbS�O7���d@��,�����i �����nSc��0n����YL��T�����<A�as��Z`6s.���������9�`�x01
&�B�
(�J�-n����tI$%�"LqP�
����ZD��*j�=�*%��F�OH�N,��M(�D��zH�s ~k	�-����Vb�b���������yq/�������mt����1��J�u_�����,0��=��+�>(��e_���`	���$M�R~L%���H�"5�H�O>Gt93D]�i��������B�d@*�#�o��&|���,QMo��Z�G�<z��
�a\N?F�zo`&K��
�3��0��^�����cag���mV�uZ�g���>P��|��W�j�����@@���4�����OD%�]U��R
��LG�����Q;�|���i�B	4����%W�P;	�XQ�����-������@ (K�k.:���h�H�)�K�i�X��(�.;g$�Ucw!�b$Q��O�k*�pa��X�@j"�.	t��<K�k����\C�����L��#��)�
hb�d[SD���7�!_�S�h�K�����������QE�FM�'�iO
���MSX��E�JI^�n+~9/����'v���v�t��D�����%,�[`]��M�����7����d9����D�r/nV�v��Ui��
a�
��!6�(RO	�E�JIO��m#C��'1R)*Wv�%(MNs�&��w,�����'�IK6��4,�_C�~sY���\�����m�������S����4�����Vk��:�^+���n���NO{�`|�q��������A8����`���)�r*��1HC�m���'�9����D�T�hq��o|F�m�y�1�&����l�|���
���k2	:�{���s}8�u�\����!��%=1L��A?mK�),������ro���F�Y���������S����y`�Z���u��l�.a?���2���,6P�T{�m�E�H��-V�����`����.^�~���������_��B�G[.�K����=[D�a�y$����eYBQpJ�x2��]��ZC�U����4��s���%dq���M!��9�Mj����Gv����g����*D�e�~�9����>����{�������s��M`��\�����?�������x���������P�n��?,?~G���{�w?o�v��b�S�������������������6e��Q��o��s����A�Gy����l���}����?'U�3<===fS��'��������B��f�&�%��b�r�?���iw{����#J�N�g�&>`����/��=?h�e�A}1�F|�������N�J��������G�Z��=CY����X�U�cY�\gm��,�0������`|�����J��j�>#�T���%��>0�E�oNv�|�
��R��#���i<f �����^x(b�	�d��|��4|>�q�Av�������9���q�3��<����#�tU��g��m��l��g�.?����mWjU~! ��1�����pY�{�Q��K��7�8��?
��>��f�J����5y��aB���V��v�7�7�P�������/�~]�B�.}���/�~=��3��34���~�R�l���R0���g����G!��.�,�T<����N+"�� �iiE���BZ�����a�Mv�	NR�_��|�Zz��������0��n���a*
�)�c�3���z �Nn�qd`���
�a�0r���(� �`����#dJNs�`a�6���rq���Q�&(������nz7�������F�w#(����*_�I
�b�!��gs.����x9���&ABz!a����,���4���������x\��{�"����B�ka����}�m�]�k��A���X��xD���UUY��,�$��@�����wY�����d��^J}��%�TQX�l���6���tt�����@5���oKj�R����go�8����<K�P���b��cd7��eb�`k�3����E����c%������_�Rz�c���R$KZ5�`�
����x��7���S�"wUB���c64���^�����p`n���T��P��C���V+r@
����O�B�K��x6a/=��k���vkw��h�i��.k���vkw��h�@��`[��(��m`+{�66�m��������o���F������Q�=u�^�t����b.#�T����*�0k�5���A��M63�F��Y`5�e��!��o��}���]��h�H��m{�����-��
���
���
���
��
�
��
������)���z��\R ��64:�g���B��3#O<Q�v	pZu��eE�2z�R�r(����%������C�Wv^.�@W��
�ql����k.�n���_�
Fx�g������)�
h� �����h�S�q<�6���7����[z.k����?.��1��������v����+t1N;�#�r'+tqM;iv�/�f�Nm���
������u��{����B����/~S�	*P"5���Ik�f�Z�f������!?��M����c<q�`�����V�p�FT�F����S���s:{���}�R����0��=O���d���D��b8G�l	y2��kd!n:��0D_�T�Y9�9����5#�X����a;��2�������)�8���8\����~�}P�����������?���j����n�m��u�!D�*���?�Y�q(����~\~�������R����B�7��I!`\���IIr�)P�SQ\�b�Qo)P��Fa.l3�=s��G��_s��*nyB���G
s�u�7�}�����Go|`�
p�A��j�<�����3:��xa�x��E�x���/,��[g�^���U��+�p%�*���c<{�\�������an�&��n�������������Y��C�A�#�}��[��=�
��F�C!B������s���X��=������j�`#�;p��O����pU�<;���..����^���� �y�>��I��z�x�u0<�������'�����8�=�(<��C����x��bf�.�I$�tOYw�O��>��`EYW�P��zh��Q M�~6@K��,� �S�Y�k1Rj�B�@���4����=l��w��c {\�#{������3���"��=:������������%����[#���#_������|�U3����qE����<��6������Z{���}�mW'.��z�-|�rF��!�gKb������5�M�?�%h������r�D�9������\)J�Y�T�5�C_���{��S���c�w>�rd>��H�k�{��
��r�g'����A��zBzc��2�gS�c�Q2�7p�
�}���;���������?�xq����/���/��AE/3����	&�������H���p��.�4���:�q��*Jl���r�d��<�V��a�$�D���#�S4�d�^�b�����%�����O�/
��r,�!�'>�<���d#�zHK�����
�>v��3�����^���bh"?ux�h/��(�c/��r9��A�����u7+0*�ry���8���0���j>���z;���r[G�@G��9�2�
���qDby/C�?�������������US�jjSiO��S�zH��
m����qH�'yW����v��}�B��vu���w+��E�_�:�yk����mN2�#��_�:�]��&�	��8Gl���5�Y�>*���"`/����~n�����$�~n��������P,��"M$M�#O3����JId�;���M$����Md��F���y=("�@�*���K�+�=�]v��[����#��w��?�\�i���r�/4H_�C�kyH��7�y���7=����eL���b��D�#�������V�v&�	T]�=v�R$r_�����
��x?
�^���c��jJ��.R#}����4��`:��c�e�>���W�_U��~��A���s��R��%��5v���a��y��2-���c��BV�c=�E��=i�e�������_V��'�=�N������o���?���������|�%��t�>>��������&�\����i�3m�BR�<�>�'�JO0=Jhz�v2m2#�>�}��� �(�B�v$������r�����^��/�����/;8VZR���f���Gu�F�.���;�7�i>��M���w���Y���-D����z�����`/z?���o���rz-;��)����U�&�x�WvV�w	+4>���7D�����(Lj�MO����j�Q&���!�&��hS��#��o��XCW�xF�����,���
?HR���2�\"SY�[�v8�G,`O^=���[�+���^F�4KW)��?0�]P��+���bz/�G����G�P�}�5	U	�HH��t9_M�y�@�/rLT����$<�9�{#������u����,'B��������8�:�����=�����h"b���Q���#l�#~ww��<�?�G��2���i]�J�=�z.���+��!�m����{��W�����|�o)�y����c��;?K9��D�pf[��UI��N��wUVs�S��y�����w�s:�ehn9���/Cs+��9����a�1��c��6�Y�\8\��4��Rx	"4��.������nt�w��c���H}*b5��;��ffTE\�����(|
/���0��0��p�5���i���"O���YS{�5�w[�A)��x@����������/�U��9��eG���1,9b	aQ���PP�hG������<� �<@��cX��dC�lK?�����OI���>Js �R��8&V��k5���#����\81�}"zT���S�����HX�#��;�35+M�������mv�]�#��w�O?�C
9��LnC+������1�+[wt�Nn�`������;�f�%�r���uc��
����pI��{�����t�(Ks����l���8�W;��HajO������8�K��?���=`>�����3��+�������c�Q_s����sE�_s�I��_�8�Y&Sr*Z_G���w�
�f��ZP���Os�8��]t�oh�F�Xa���jg��k�������=��x��Gc��t�`���h�]r���5;����:ZUQ��A��O����w��}��������o��r�z/��&{��$����^F�OVsxA�5K�S>��{$q���z��=Z���!�3�x"���3�oQ�S
8���,������GE�rg{9bgA��I���@�E�|#�Uj��q���?�lSS�C98�-�e�z���9��x�����7�*��8��Xv��������]H;�T���.�#'���Mr�����;�n��^���������r'��?�9�pN�;9�]����Eu�3x�g�	~RBd����J���O���������-�$Fr��E�j�5�u��A�W����_	�~%,�J���/���:!���X3�"��m��i��XJ������30�,`��4����|�_5mI��'��h�^�H�G��!�{���&�O�w��\}�}/$���=�}�M����KG�n���K���H�@*�����a���������iK
�WKH�1T���.I�I�Q�[
�Sw���OJ��q��c������������L=My��?)�?�����������������<���d�2��/�d���]��K�uI�.��%���V�����_c=gd~����Y.� C�� ^��Fxe�fP�C�H*���G���c�G�s�[��k�����"�@w"{|������y�������v�������q����%F4hY�,p����g��D|1#�A,�������hJ�g?����S���vB�Y�X`��<�B(�(��4�)v�1��a����{w��/_��;�g�I�
}������L��me�b� &���,QJ�{�2E��Zh�Zh�Zh��B�C,��J6������6m�D��:���D��mo{��C������iw�����&*�B�:�h����t����g���g������o�Y5�b&����i�t������JC�����i��l���x5�"���kLN�M��t���"��'�������D[�.�������I���S�G7 ������`�>3�����
�krz�,�/=�]z�Z=Y^��\�v�/��K:.yxz{�-���!�1�C�T����)��\��d�����n�_��(m�rL��w����
!�t�Z��Z��(�a��/�����K �]��|V�mq��p��6
�L������K���Y����9��+���������\P�_�����?�����y���Z����N&��^���=9��;',��0H?{��� ���xo�a����
K��Ie��Kl@)�S�
p3��l����
(kf,�N�~�
���Xb�9��~���0Br�7]\�bbe(
����]��
��
��
���r���MN$���N���kFvK*d,,u�[6^��s��	��"��e��ZL ���������DS�l������K��_��o�VH*_vR��2h�\^���������Y�*�f���-�I��A���=�:�����N||�F1K�y,OiC>L�.����.�>��U�nR�����-�����Oz���1���kL�����H������Mn=�)��7�Gv��d	���f�����qk�Z*����=n��8���'�O=�������6���0{����9�)Y�-������"c����z��>�h�$�=��/$�q�NP�=������6���x���P}�h�����` �����ct-"=���2��;W'�$F�+����e�VG'��AW�)+O3�]c�����e�O����:v�	M#�����eG���1,9b*���D���N�o?byK��Q�����rk��GT��V��`���G-?�����Q������^�S��������_����;�/�\r��\rry��+n]g��8����8��UqH��k	��A�Q���V�
2��F�j�f�I��@��,�8IN�����f�Q�F��bd�����\6��vk��@f{�N�oW�oUj�Z������F��}V�_�ty�Z�1'�1�M�SP~�\%���	���������y�"��U|�Y�9�^��<5]��z��N����3���	�W_���4����$��A�N��n5?0�j�����fC���
e4��x�"p_���J�������`��T[��jzuT�_c�|��|���
�+�n�"-n)?��x��={��"�G��
`TQ>���x��I�t�lv�A���[z���!�
��.�5��g����r�z��u�A�\�'��I�0YG�
�����c�&�����+�������/'��e:�j�����{��p�O&����f6��q��`�[���zx���l{�
�������\��4�n�4��K�~p�^�J�.��O�]J��-����f�~2M��D�Kcbh
��J��j?�n��_B�?�66&�����Ej�����tg����C�9��R�����i����*�3�G���#����=���7�G�� �;���XO@i�]Z_S�
�o;�=�l����H�	�w�k2�i�1i�f}����34����	���s�R��vYf���~�/��>����� F}d����e�f��
��<~��G�l�����8z~�+N�WI<9��x�h�9���U�n �/=-�`��2���O7��.����2��@�6��mID��0���X��7��^�������}���4�=���(a�^Q��0������
�z�e�'��>H
��������qw�����Qd��[J~����p�������F����]�Vjo���W����/_����#�(��nnf;�U5��r6��@14���l���3��@��7���Cp�����AE���9`@��k��BJ���y���j9�/�.�!�`*Z���f�����������O��U��5�5�h��w����O���K�O�����g�����M(sg������u�h@�e����?�{hc�S9�	��D� )QLJ��2�9 ����|�_7��8��`���+}�+b�x^a�1\	O��+\:��=�<#���u��N���p��
�����n���r���-��l�m��C��k��-��������z����2���E�5z(�m_���{x����P�B����u���s��~���~�T�T�������������?c��30���;Kd��K'����o���-�!���-?�L��DC+�������?�xq����/�����#P�D(�oV��f�A���61j�q$6��0 ������l"P�T�MD�}�8��W`@
8i/
����G�x���HO�����!�]��>�n��9{�������1�pR��E����.�C����'���}��������`#����.j�UG�rn�'�fS *BM�V����W�k.N�:��	�S'�9 �@�����h���x?�e��?raJ
x�H��(�8��B9��~��\t��08����������'XD�z:�����_4�jJ_c'�|Z�Q��#��-�$���o��h,��e���]^�_/7k��#���!%�;�a�q����3�Wx�%� ��{~_���DBt���WI1�g�|�
���F���Ix�S^�=6w�.�Li/�,/�l�s�+Fz�:��!���#���F��9��]J������m`8hw������"y;=��1,B�G����'������b���I.������I���c�=���Q�+���6���}�u�E�4��R�R�
px��+�g��e�Wn
J���$�s�3�o���an����f9({�S�/&��bHC���hcy.m2���c0�\�������m���qCT8�19,�c��e[��n�	��C���5z���k����|��q��]6$��Z� +���[���wM`��%�������'O4F�L��]�3t����3``�R����p���������L���2C��~�>�C$�p35�t8���Q�5�X�p���������*�0��#1�����)k�4���'���a�+����W�C}�P��j�l_s���s�}��f��O�;����������i��g���i>���4�v3�T���G�((hQm�=n�['���8i�p�Q������z���$�
M|����l��t������z����������n���v�N������j��n�/��)6`���c��8~-�����A�C�I�w��`<����i�������
[��O����A'b?qlr����Z��?��T`����G��u4{��h��c��_��:^��8�MF��r������v���,h���1��`��<huY��m�*��������o�\<��
N��.��	�
����C��n5�n��T^s��Mj�M.I�����5g��g��L�2^.�������I���O2�XO��D������3�5��L7�c�S1�>V��T������)���.C�����d9����|v��^s&q��	�v�I�X��:���;�u�@������	0'�a�?���c���
�[Y{�;�&���y��6��ck����t1�m&�	�xz���F7S{g���LG{�0�L_�1ha���C/d��p*�ms~��!Bq���#��I<��A��b�C�JG[v�)��Q�t��xt:\v�4[I�R���R��nX�;���vR�h�u.}���\��������}����!x�a;�Ity�Z/�
g�e)Va�Oz6	��=���Y���P"��|i�,(�Z��DU}������T&���M���3��kO�i�M0�cq.���*
�\g�����?��2��������������	E���������u�\b={��+q���!&�IK�|�o������p?{����*)����g|��J�N�D���{��3	�LJL�Db�����T�4G*��<$�������8ICV�I*�7����^���?�l��2�Y��R��I&���[~s}y&�������i�<IQ�4�,��;���~��V+*��d��)�\`�m�y����9��z������40��f_,���}AsH�@L�$3�{+e�L�8w�f3�M�M�)V���������|�wPf�!>l^��t�YYWP�tF�0o�M��x���>��O*y����z9��%[�"������
�� F+��RVQ�Ba(fWS03I��8	����Y�y|}���A���Ds0LP>m��������f}�Y?R��6X�x���(8��5�Xb��$��w����3����8���@���)�s�}9����"���B�oI%v��/0
mh�%@�o���6[;Lm�p������!�8��n�Pl�Pl���[���aj����:,������-��Plu(��_�w��n{���F����.Z���~��h�E�������-�8���%����8|kD���X��:���+)*����6���LUW���w�~l!/mS�����o-��?!�u�I��B�]'��!���Ogq�hlnR	P<������M��j�[	=<�Z�n��A0������Q��Z�F��
����6�O�����������t�]�M��>_�t�����{�l ���E����Y���*U�Y�X\.���r������a3)i��}����h�_��YSg�;�|{��p�_"���r�oQ�
p	������Y����0v�����t u*lc'��6v����8F4���m���.����\�';aMg%~��gu�+�KJ(|�2+&};g�+��T8���oS7��e�%���>������B �����CQ����������%_�����t�[w�C�$�!�]�x�a�������T!$N��9h�a���}�=g�8��^������o����={Y��|����1�%��!������o�������7p���0�O����(t�_t��!�#���`��C�G�O�{y��2U��z9Yr�.>:`��.=�.����=E�x������������n��:���=$�
�%U�]�WP]�3D�E�>��lG�o����/.��/~z�������;�
/��NP����2���&�7#g9A�O��A��9��M���GL5�;��+,�2������*���!�&\���cS�.6�,���a��.m�uY��1��c��7��{$���i��>�*����p3����; �=�g��q6&p�������G���}G�L��$��[���$�������H=�?�^B���7��	����9�C�bBMf�q4//I{F3���Dl[�����F'|%?*hf�~KYM�|Y^������Oo0�+
�����1�����QwS���9�x��c��?������6I����q9\�t�������������$��-'�g����||���*�����x��C������n�v�`���$T�E�,?,�����j�#�
�(�B������=J���u5�����;f�#�]�����N=h�v�����gd�bewg`���i R�4
t�T�.5����x~��������
a�1t�\����[�0�@ov���8�����]�A�b���#��u�D���h�:��V��8�������t�
'������ ��4I����l
|���4��J�`��X������2���F��4�4(���*�!�?88�X���}X.�Y
C�J��6i ���D+�J���k�{W�t�U6w�Ml��aN��_���������D�(D�����r������x�E8����)����[C����1��$M�#S��3h�;�?�`r��l#F9?�������I�K�8kL2�V��R}@�7�������_��%I�@w��U8��@`�v2�[j�����3&'d9'^�7U[2�������k�v�;GkX@o��Y/��5r��xK�u)u��PN���x0h6[�����n��.��F(9��{�v*x��-���������������*���p'r���U��%T;-C��������!�v��L����GGPvc(/�0�r���@�����,�s��9:��={�j+��3	�N{�l�/��(.���-�d�����i]��w�s'��S���b� s"��zo�G�yAG�!$�+D	�GF�1g��a9wsa48�#��s9?����v�����v<��z��v=�;�������rl6'�;��?���������#76v��ZV��'���o6��i��B�;����&�dqJN�$e����[��'G����)H�Z�-��=�S~�����W�h>�t�W���e�d����8N���wha����T2�w��@	���;Ci��	�h����xs�HR
PG�`	>��0�m��v��-8��Nu�YA���-� Ns��	�@M���M����c�����gr�����rI�X���c9�s���K�����9W�N�$�!dF��OMBAMv�D�5�Re�Fh�h��K����f���'yl���;1���G�I,���@j�S�� �<�
���Cu��(%��I���3Y�j��QGTSdp����e�U�:w6}����8��f�����:VM�����k���f
�L���������[/O��6��Q7���� �����1O���X��4N�o������0��T]�r���+��!;$b��dG@�1�u��f��A����gD(�|�����r��y���X��"(es&���d����{�f3
N'Q�o���c�)���O��T����K��{����	i'k����j��^�D���[��Q�L
3P26�z)���������B��u	���I����S��@�y������#
e�O��IYI(���jvK��&�&	|��X|�D�M����m�z��6���E3��r�]z�bb�i���_9���/������������HI��e2k�6��`Ds���#����������<,}u*��+��I��d4��6��~�?
OK�D=P�5�M��-8�W/d5������.��N��������f�/.X&��0��2���8���a����]�������2}=�E~�,�gC�jp�7�Y=�
*�+s�K.�����bo��,�1��p>Z	`��#���^/��p������p�%��Y#�W���f�`���G$S������z��U���8��W�z����<�[�A�����4n63X��9��������.����6�����|ecM3�]���
�������a����,5��W���].�>�b�2����X�oA\;&���m��n�'���#���"�� �~�a�tH��//^={�K�����ga��,Y_.���x��5fPYDY���� $j��[6@��E ������:������g/�/^�zz�E\&���p�._�,�������2K��u/PK}
����"`;�wl`[�PSk���<os m���F����\i| �@
�0����c�%V�Y��.�o_-Q%���P�n��O���^)��z���B�[x$����R���\���_/���0��?��S&���C/WU���|(��R�^:y�W�J5"��;�]�v���e7h��J�|��r��tc�(*�~>��j�7�}������u~v���	p'�gt���@�]A��3P
�{��Y�l���oN�Q�kR;���v�G���?���w}��&x�����>th��������)�����[������.�f
jE�|7zW+gpmn�z�Y&�6��'��Z>h?������N��U�Z��[��z1��-���L���R���J$;*j!m���h0�4��~���&o�,��ZQ�$�J���������/~z����4p�E��ZOf6��#�kP+	i��$����9�*o�����r�����o�$ih�WH��.�������������Nh��\f����S�`��v��l������tO��v���G��w��p|9�tN�Aw�������������
��4�����f��Y��>]��e�XU2CE�U)m��x��� �QEv�y�����	�_0�
2�h���K�cm�[G�"�L����PygC6\� c����l:�����y��Q��*n/��P�*sX*h�&U���G�����E2Ck���c$`-�\Sr�\L�G��k�gD��R6��X�3���\=,��:�z�J�O���O?�y'j]�v;��� ��wh�N���_fp�I�J�D_3X��{H�R������@]�\�����+*
8'��������G�l	F���,�b�W������%"R{	D��jzEO2#%(�#�E�K-�3�|�:rI��%��m��g����p8NJ%oa��"st>�E4���_y�����
fY�L��Sa,5�g����H����@�������Q>���3����O�,ld�(����@3�����$�������C
��(�����yo#�����9���A��&dV��-��a�C������	G���,��,��L<���+F���R�I��D^����9
�tX�nyI�W��&qRI�g���67����mo	Uj���}�V�@p3����=RK����U�A�E�6_�O�*���}�3'��]h�HL2%��Ne~Lw��ed�X����x9'�1!U�E�y��]%�����6Z0���K_Xa������'��0������\�E����_��5^�of1��X��qH�����W��a�����|��>���~�k�[�>�	r��I'��x2�N����Q���D�����!��t���#��@�S��	T�*�N_Tp��BS�BXF�+�f��=d�T�r*���,���4�K$-����rq���/Q�D��x����qOt��H���`qT�����G���Z�����k��P<�Do��2�T����J��b���.�C�������:J W�h&j���$�-�W��d���|����r~�^:�-Q�{��*���i��.+yEn/eBVL*��n	/���S�3��H�d��%u�e%S��E�k58����0R`3G'����v�"�
�]�Lo�
n���*��'�(0.OB�6�\�Z�BwF����%PF6�����B�a��'br�D��&e��E����nb�vaL	y�G�?���96�z������d�P���_�H�QoQ7�t��J�QU��H�M���<����~��),�Z��h<��K���^����@Fy�61����a8�������5��)�K2���w�Q�
�`��Ir�Y/����|��B�aE��������h�*v-(�����=y�������Q8L�pHD`K����U��~������������d�����o�;8��U�h�6��,9�O����flT��?MO��&��:k��5�O�Z�4��HN$=q	�3��������~�>���O�|�n��'�Ita��6y?�?$X
�!*����TA�^��V�
r5j^�y_	�Y;\��`��D��i�	gvvD���)L�@��M������n0:�W�!c��oT@�7���0�]��;1�=fUg?��$	GQ�\��}w�>Z��������|�Q��o8�6�y8������h�}����!:�;�E��a�M~7�-�T���c����w'���D��B���|�<����ctd��$d� >��T�=�~����S��Q�����pNB��X;��IEU!�)�Y=�iN�.���O���nt%�xV5���HY�p6.���x�L�CM0��
hB�=���0B���D��$����%��i���9q��'�^��N�7i~
P�`]��E�e�m���f�N����A�^����>���-�^-��S�'D8����7?3�D�;!��M6+k4<|�\�tl���������LLPsb���y� ��8����+�����Y�������D0&>���(>�����l&�����qD�gM��(���������=�D�t�����%c.?@0g�eN1hU�,s���`�%��MFr�e�AT�f��������l���3^#�%�hJ�K*�� H��tac���P#���D_�2�08�I��_G?��!R��x���l	��c:LvP�	�|���� ��jg��Mba���x�SY��;(��1�y��8�]��%&��Wj��q���#@�t�5B_��W!�(U�,q��jJ���M�����7���#�s��B;�����)'h##NC���C�X��
������jw�S=At3V�2�lO�����5b�kM����|��ce��"�p�+5�)h���S�t��j�9�T���0Y�8~�x�Z����y��.j�Z�c�_gKge[�+>e[�19?��R,���y� �����;d:'����\�|'�����d.c���0�(�^F�H�s��e�
�����*Wy{�}�p�~�R�)a�k���	�9���qx
��t:�R���r}Rm��b^����K��T�CHC=�T��R�I�dY�����=��x����D���Fvv�O��${�;���.�`��6{��9���L�*3������d��O��������+SBE8-������2�T�A;$�v�!�mpU�����r�[��N������� �#�����q����g3�E$��uN�p(��?�����[J�X)�g�<�����JM���u;Y�v�
��]4\��B���lo��\F�f��O'q��h�Tx��]�b2�.�\>��|1�D*.�=(�s^s��9��"��R'�����Y��O7.a&`��yj��	���d�8��`t�n6E�O�=#��1d]�E'Y����P�.�2�7��+��8I�J���!?�#O4�����b�q�zx����/�/_�_S*t18z
�T���SU���#�Pq:1�4�L�����|����zW!8��2zEj.�/A�0Ek�-$k�'�`h�������ULeF"(�a#�e���4�d����>N�,i$"YB�vKa�A0�
������V��P���1���.A�@�{��Ogj����G��x�>f�����`�p��2��7�1US����o)CM�J`{�>������y����fS���)9D�)���_�m�����OO���Z}:���	��������j%�Y��iG��\�dx���@k&�O3�D���6O��U��R�K2uu�]��T�+B��Q�����M�s/g���X��-]y�*��V�u���)�C��c�-��|�c���R��N!��	 ����5�/�u�M�~O�;/����|
ezU�BLx
Lb��a7�ZvL�R�
�&0�5K������V�\���W)g��D���p��;�#1�z��L���ocS��Q�k3�m�����������s���x��K#YQ���[�J��k�nA�����Q5w�E��5";�*���?��=�d&�AkP����.h*b����MQz�(��-'��=��2Y�H~�L?f�{������YR&�[%�	F��s
�i$�o"�����r1��y�/\-��R�I`@���'fI���I�Y�^[hH>"I����Ae����R�H�-_�0[�7��X2�pK���3��U���T�����v�]6q��Kp��q~!������0����Iy�y������;�h�e��"f461��P��3�Z���~s!6Y�#tUfR[�7P�0�B%��X���
0�zX�^a-3��Re2D�0"����Y~�<
cF�Mh{yu8���D
�,���+l"�>KG�h11��p_��-n)�$�M���D���q�4S���-���L�I�'�6`��)3���Z���=��p���)���\���0��b!��1�%d>&9B�.Fz��&d��7�G�,����O��<w���MW5�ri�TP]�N���A�8/�v{gNZj������-�&�9d T(#���������>\��0��A��$�<4MD�D��]���$��4(R�2Jz��hB���D"�����������r%����e<��
�{K4AVOC��cQ=e�/�
e���Z���aV~�}����'n "�A�;2\����L��rtY����d�H���&�7=G���������{��#�k�F�M��i�Y���(Z���A��� 2J�[L��jx���������-��s�����	���[����~��N9�V%�]q���\���O��{u�\�%�������S��)K85K��c�����'��QR���%R�����h��B&���M��#��GG?�s�VA<��Eb:kA&�����U[�?����\���i�O��;=������c��%�������9J1K�����F�[��P"4im7-�2.-M���|�(�g~ ��Q2��`���<�[��[-#tW�].��P��]���8�p�n�AcH'��s:������p)��tm���X�^Q�Th`������K$��E����H�9
f3I���<(�U�[����������DS��t|��5����VI.�*��Hi$I*pF�r��n��{#a�?:>�"��	+��p�r��w�����x���MmN�LU����oX�'{��*Q��K*�)T�dfTS�5��b��uJ��D
yYXA��_q]nf��%4��{K���~M�.�Wa������"d�!(����pErd6���,�,m�|����{�Qn�����R�0'K��o[K
��F*AB?������V&Am�i���c�m6��/���x���M��
R��v������aw��e!������0'.%�1z�?L���:��p�.�-_F�1����u*��U���?<{���z����xC����������U����������
��=������������o ����c)�������
�.�95�Q����J+U�9���u��*���1�CK>\C�*���j�f�+%pq���>h	��:b'��SM������%�>��EV�����o1�#�o�Bb~�C#2�H"BFp� �k��Y�=(���#��z����e	�TNx��G)��0���)�����V����w{M�R���6�i���_����0J�GEe�?Y�H�Jx?g�Cy���������;���o����!jy�:�Ivu4��r��������j�>�������Cy��e��GlP�S3���'��1���rM����!���P0�/,��0�+�MW��<z�S���[�vx�(���r�<X*2��G�
Y�Td��SUW���e���b�`�b�+�zK�&�����A���������8&L���Qju���&��d�Y����(��������H�'L�J#(�*��u��i
N�%l�\}�. +;�H.Hb����.g��A��%���SC�Q��M=FqFha_�Vk�5�C%�hf���-&�e����u'#��e>1���2���9�������@f�b'(]�ZwR��n^�p���{�I��FF�u&��;�m���3�6��l�_p��0L�4�[/��m�qy>��������{��t�����miS�����r�w�NQ3l(]�UU�b@x9�cF(��HC�i����S)i����N���'ecDj�����cQe�k
=�����:�T�����6����!����L��/KO�k���j{�~(�������eH*�i�_?A�
fo2u�k
(��FU���������
�(��5h�h���,����(��y��CJ�)m�o\#�:$v����
�R�A�������2��0����\�1�(�� �b�zGe�mxWnd3gd�����#��al�D�)���o�5�����A{!�E��
i�u�=���O9�iJ�H4��(��HV5��� #��j8����dP>1J�����*�I��i<�c�g����v��"��9=�w�4�<�P@X�1*��r�OZmD����(�O(��*�5l[�\��0��RPUS�\l�%�����ES�Fh
b����T;J�����o�N��C�F)��19M�$#K���K��9�;a��.;r+���)�^�?p���9Z5�4����E�l���F1���������F+�2�i _��)]������p�����x��3\����CPoVS2*0!�Vy��)����j�Z����sB���}4�a~�����%$i�\��E%i���q��YHZ=����I�J�ft��Ug�z�:��m������*4p�x�M@�jx%1���B\���	����Nq�CeRNU��Z�$��]>�.�|�"��P_�L�/��>Q7O�L[������[�������&�o�$������
��kjB���X���2��h"��b�Tu��T������8�`#h��%�9�����f��hJ���v:�t��Q+'|x���T�r��ieEe�t5�������n�[[���,��4�"Z�Y��(]D4C
���C�m���`���C���4��+�;m�>oOw��]]����Aej�9W+ak�;�"���/{B������������U.t+�p���O�CA�r,���G�����rx�=Q
�NJC�X��
N����+[���npZ��l��<Tw!�)��@I 8W���K�54���9(<�`���}���k����|����~����G�`�~q�������W/0����-���Z.���"L_�^rPKf��h��Z!�"��iu���]:���e�$���)����	��8��x��xmI�X�(����*���)yB�s��eR2�����!
*����J��r� ����W��nY��!$V�TH�9���o�w}�����R"�#J���i����m���7)?��A�+��R�X_.��R�C�z��` #e�4���Pe��K��/�_��m��U��t��A^C
�aV!Y6�4���S?�[6]G���lFP���&1:g��>m��}�Q���'.�/1;��Q�w��5�
Y�VZY^�d�P�s)t��4K��@�P�W\u����"�G�}b�W���2[GS�p����"������I��
��E�m	r���0`{k�
h
���a��F�Nh
����&c�61$�74���?���Z��B���&H4Tr�Q4�[n�6R7���:s��jR���P��J��\��=�T�c��R�bZ��2f�g{�M���"�3�h��
�
z�^r�
�����M~q>|��O?�����W/�v��r��V�Q��hNmU�h�h]N7������eHR�:�H����t��,�����>����$�O��|����,�Pb^%��]����8�#E(�Jb|*MH^T��oS��t:������rv	9������\�7�qf��u��v��o�_
���3�[<���5�*�1���Z^�����9wF�3�1����[d4W�.����adj����u�X�N4D	�TU
w��[���I����h�|�w�C6��1�B�i0+Qj�(3�e��x[�K����d�H�@��y�X$M7�M�.���S:G�|����P������(����	�+1`������_0�R�(���#�r;|D_�Q��aL!�D?L1���2��EY�����[B�6�? #��>�����T�.u�
�b0��1:��l�a�,��������5c��~x�L*y��/HY�����A �o��T������Rh�4����������_�]n�����z1j�@��e@�e��U1�}$l�J����6��:�pcR�`+`���	��!> �8U=r����D�*S*Wl��!����@����$AXq�f��������"�1��Q�0��'u�(xl �)��\@}E�e��$��(m/fz)���b7-7���U��=L@�n���gZC���3)��H�/�i��VdV0����..��0��wr��Y��v5�����Kv�O5��:�K}���,����YI�<7������;L{'p����NJU�����#<�S��w�����i�H'����/��K�
~��7c�����;<��,J���XM'�,��)���x�����=Ye�o9���~k�W�g	�f����Hn/(� ����f<���`�|h��8��:0��	�9B����,���&��VS�^�Qz����1�t�Tq�4')��r�7��ie9��9m��LD��j���#��k�?^�Z��P���V�)�=T����M���Jo�@)��� �a��KU���!��,
1�vH����N�4��J=��S�\� !��*b��dc��ef���nZ~���"uf_F[M\j��f����������-�Ke��\8^�"4�Z��]��3 B�x��r���JIK3x(��o���Y<�Y��"�����W����P��4�N*4�ER�!����z7D��2����ygE��.�pA\XObt~7O�f���g�����6��wU�l��LA�dy`y,�$K�5;Bh�sv�j�Wq��H�D���/'��[T[I����*Z$���:e��D�/��"I�"��mkb�
��{	A��T�������1K����X_�j�$��2
�#�C�X%�G�����������	APk���ajJ�8b�����\��\<_���]^�#L��/�p�i�{�P\��\<OG@s���H�h����M��qb�R�����=�'��\��=��D�QbQ3��iE3�FR���U�����Mf)����{"������YTZ���r�U�K�C� �Ja0��d��RV(�����0M�;�T"X!i�R!�Y��~i�QbrJ��q�E�t�J4W�T�"��JDx+S�'�F����)��<zeou�����4������F������x
��R��
���Z����r�L����C����qH0(�B`1�
���B��r1��
E0�b0�'�c?&(q���v���p����;1����'K�Y���h`��Z1�4�L�	�=qi"�����vd/|q�9�2��+ ���^+�5���(�v�2Z�X����B��$�)��]v*?D9�4�x���7���hh�-�Q�Vp�6J��iVsY5�]�-Z�d�"��v�N��.��MR��A���g�������Ecq�Hl&vC�m�B�r�p�;9M��6��Vz 5������WN���m�����B.�!�o��K&[��Ce2��A��b�Y3�R 2N�}V/l#�N�m� �5tk{u�����r	�P�R�P��i�&p#�������	{L�j�^54�������j����I����_�Q�{���}���C�kY`�7]��4��������=��v���27���3���}����+%M"������R�J��V�E���������a���j$�+Y��w	�y������B:����b��t�:3)M��K3��n�T�$����-����U�}�"V!fJ�2����������d����N�Kl����:�������:�feWu7�c(y�\�#���-��YOg��|��O�6�L1��kY�o�������w�q0����-��in��a���W`��"���.�WC�'}�����W�<N������5���Ji����.c[���|��o���O�?���=?�������?�?�z����Og?�~F���9K1\,�7�o��a��N!��,T2oN@��6j������2���0�i.�,�Zjy��<���]���DT�Jz���y�����"h8���_gM8�f�_�j��e�������fo4nu�a)P�����>�������-*2]a�G���_.���t
��SDA��%g��-�B����*��oY�	W^p���[p����t1�m&�	$�JN�z�Z��������q�j6G����wx���m���"��_>Y'�%���`1��$�2�t�������~���/^��U�"�h��FF�ggO��V����e,W�
�|��@�:��Y$'��R�����1����KZ���4q5��*�M�8���/�j��U[N/�C�_�S���`h]b�����
J=��+f�tu�R\�����;�4���[�Q�5��;��'A;���X��i;�X���L]�hY�)���-#{^��F��K	d��QD���L x�ed���Y2$'��C������\^
��
N9������Nlw���O�>�o�g�	��v�/*�������s�n9i���
)�|N19�OUZ?T[4���.[fHB�O��84���5�T����I1P�O!�h���Q��/��&fV
����S�.�f�(��#rJ%E�	�jJ���=�v�6�������:q��[�����l��8����y����$�-<������A�.�ih��G�j��6#05���%wh������0���d�[��Hn'Yu�
>�:��/I$���X� �K�t�0���g��&���<v��8y����S����~-8����n������i�9��Y�)��.4; ~�s�1������?A6h
�,��A���-h
Lf��W�C7�r������nc���r�D
.Gs�X�3�*�����$W%@�0�v�{<�������?�VZ�2�#G�����x���+��7����U���E����^��i3�����<�H�����������}4�����(I��x����t��7��|
��U�)��k������8���c
fX��g(^�����4�(�Hl��7O.~<��6$�8Ys)���D���6y?�?$MP��J6`���^�'��f��qk��Ci�)��KBa���)}b��5^n���c��k�5k��(����f5r���c4
2���~��9Y���j"Jh������u���\���&01� �����2���9m����bZ�h�eG5!5�T�C#r>�G_��{�f�?��a��?5�/>�X�h��G7�>��q	�����"�=�tT?� ���XO����c�$RY�m 	�����l�������9
PM�!���
5-�E�j�2=*���TeT��T\~���a���Z4�A��Rz"��r�r��U
��`�k��|���\��P�h�=G+����@�n�T�8z��u��9���M�:Z9��hU
z�!m��*��#R�i�����!@5���O���#��!AY
F��g������S*��*�G�#���OW�����u'{8�e��9h��A����?_��|������������E

#120

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#118)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 15, 2018 at 6:08 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

In current create_grouping_paths() (without any of your patches
applied) we first create partial paths in partially grouped rel and
then add parallel path to grouped rel using those partial paths. Then
we hand over this to FDW and extension hooks, which may add partial
paths, which might throw away a partial path used to create a parallel
path in grouped rel causing a segfault. I think this bug exists since
we introduced parallel aggregation or upper relation refactoring
whichever happened later. Introduction of partially grouped rel has
just made it visible.

I don't think there's really a problem here; or at least if there is,
I'm not seeing it. If an FDW wants to introduce partially grouped
paths, it should do so when it is called for
UPPERREL_PARTIAL_GROUP_AGG from within
add_paths_to_partial_grouping_rel. If it wants to introduce fully
grouped paths, it should do so when it is called for
UPPERREL_GROUP_AGG from within create_grouping_paths. By the time the
latter call is made, it's too late to add partially grouped paths; if
the FDW does, that's a bug in the FDW.

Admittedly, this means that commit
3bf05e096b9f8375e640c5d7996aa57efd7f240c subtly changed the API
contract for FDWs. Before that, an FDW that wanted to support partial
aggregation would have needed to add partially grouped paths to
UPPERREL_GROUP_AGG when called for that relation; whereas now it would
need to add them to UPPERREL_PARTIAL_GROUP_AGG when called for that
relation. This doesn't actually falsify any documentation, though,
because this oddity wasn't documented before. Possibly more
documentation could stand to be written in this area, but that's not
the topic of this thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#121

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#119)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 15, 2018 at 9:46 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Hmm.. you are right. Done.

I don't see a reason to hold off on committing 0002 and 0003, so I've
done that now; since they are closely related changes, I pushed them
as a single commit. It probably could've just been included in the
main patch, but it's fine.

I don't much like the code that 0001 refactors and am not keen to
propagate it into more places. I've separately proposed patches to
restructure that code in
/messages/by-id/CA+TgmoakT5gmahbPWGqrR2nAdFOMAOnOXYoWHRdVfGWs34t6_A@mail.gmail.com
and if we end up deciding to adopt that approach then I think this
patch will also need to create rels for UPPERREL_TLIST. I suspect
that approach would also remove the need for 0004, as that case would
also end up being handled in a different way. However, the jury is
still out on whether or not the approach I've proposed there is any
good. Feel free to opine over on that thread.

I'm going to go spend some time looking at 0005 next. It looks to me
like it's generally going in a very promising direction, but I need to
study the details more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#122

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Robert Haas (#121)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 15, 2018 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I'm going to go spend some time looking at 0005 next. It looks to me
like it's generally going in a very promising direction, but I need to
study the details more.

On further study this patch is doing a number of things, some of which
seem like better ideas than others. Splitting out the degenerate
grouping case looks like a great idea. I've committed a patch to do
this loosely based on 0005, but I whacked it around quite a bit. I
think the result is cleaner than what you had.

As far as the stuff in GroupPathExtraData extra is concerned:

- can_hash and can_sort look good; we can precompute them once and
reuse them for every grouping relation. Cool.

- can_partial_agg looks a bit pointless. You're not going to save
many CPU cycles by computing a value that is derived from two Booleans
and storing the result in another Boolean variable.

- partial_costs_set. The comment in compute_group_path_extra_data
doesn't look good. It says "Set partial aggregation costs if we are
going to calculate partial aggregates in make_grouping_rels()", but
what it means is something more like "agg_partial_costs and
agg_final_costs are not valid yet". I also wonder if there's a way
that we can figure out in advance whether we're going to need to do
this and just do it at the appropriate place in the code, as opposed
to doing it lazily. Even if there were rare cases where we did it
unnecessarily I'm not sure that would be any big deal.

- agg_partial_costs and agg_final_costs themselves seem OK.

- consider_parallel is only used in make_grouping_rels and could be
replaced with a local variable. Its comment in relation.h doesn't
make a lot of sense either, as it is not used to double-check
anything.

- The remaining fields vary across the partitioning hierarchy, and it
seems a little strange to store them in this structure alongside the
pre-computed stuff that doesn't change. I'm not quite sure what to do
about that; obviously passing around 15-20 arguments to a function
isn't too desirable either.

I wonder if we could simplify things by copying more information from
the parent grouping rel to the child grouping rels. It seems to me
for example that the way you're computing consider_parallel for the
child relations is kind of pointless. The parallel-safety of the
grouping_target can't vary across children, nor can that of the
havingQual; the only thing that can change is whether the input rel
has consider_parallel set. You could cater to that by having
GroupPathExtraData do something like extra.grouping_is_parallel_safe =
target_parallel_safe && is_parallel_safe(root, havingQual) and then
set each child's consider parallel flag to
input_rel->consider_parallel && extra.grouping_is_parallel_safe.

Similarly, right now the way the patch sets the reltargets for
grouping rels and partially grouping rels is a bit complex.
make_grouping_rels() calls make_partial_grouping_target() separately
for each partial grouping rel, but for non-partial grouping rels it
gets the translated tlist as an argument. Could we instead consider
always building the tlist by translation from the parent, that is, a
child grouped rel's tlist is the translation of the parent
grouped_rel's tlist, and the child partially grouped rel's tlist is
the translation of the parent partially_grouped_rel's tlist? If you
could both make that work and find a different place to compute the
partial agg costs, make_grouping_rels() would get a lot simpler or
perhaps go away entirely.

I don't like this condition which appears in that function:

if (extra->try_parallel_aggregation || force_partial_agg ||
(extra->partitionwise_grouping &&
extra->partial_partitionwise_grouping))

The problem with that is that it's got to exactly match the criteria
for whether we're going to need the partial_grouping_rel. If it's
true when we are not using partial paths, then you've missed an
optimization; in the reverse case, we'll probably crash or fail to
consider paths we should have considered. It is not entirely
straightforward to verify that this test is correct.
add_paths_to_partial_grouping_rel() gets called if
extra->try_parallel_aggregation is true or if
extra->is_partial_aggregation is true, but the condition doesn't test
extra->is_partial_aggregation at all. The other way that we can end up
using partially_grouped_rel is if create_partitionwise_grouping_paths
is called, but it just silently fails to do anything if we have no
partially_grouped_rel. Putting all that together, do the conditions
under which a partially_grouped_rel gets created match the conditions
under which we want to have one? Beats me! Moreover, even if it's
correct now, I think that the chances that the next person who
modifies this code will manage to keep it correct are not great. I
think we need to create the partial grouping rel somewhere in the code
that's closer to where it's actually needed, so that we don't have so
much action at a distance, or at least have a simpler and more
transparent set of tests.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#123

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Robert Haas (#122)

3 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 15, 2018 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I wonder if we could simplify things by copying more information from
the parent grouping rel to the child grouping rels.

On further review, it seems like a better idea is to generate the
partial grouping relations from the grouping relations to which they
correspond. Attached is a series of proposed further refactoring
patches.

0001 moves the creation of partially_grouped_rel downwards. Instead
of happening in create_grouping_paths(), it gets moved downward to
add_paths_to_partial_grouping_rel(), which is renamed
create_partial_grouping_paths() and now returns a pointer to new
RelOptInfo. This seems like a better design than what we've got now:
it avoids creating the partially grouped relation if we don't need it,
and it looks more like the other upper planner functions
(create_grouping_paths, create_ordered_paths, etc.) which all create
and return a new relation. One possible objection to this line of
attack is that Jeevan's
0006-Implement-partitionwise-aggregation-paths-for-partit.patch patch
adds an additional Boolean argument to that function so that it can be
called twice, once for partial paths and a second time for non-partial
paths. But it looks to me like we should instead just add separate
handling to this function for the pathlist in each place where we're
currently handling the partial_pathlist. That's more like what we do
elsewhere and avoids complicating the code with a bunch of
conditionals. To make the generation of partially_grouped_rel from
grouped_rel work cleanly, this also sets grouped_rel's reltarget.

0002 moves the determination of which grouping strategies are possible
upwards. It represents them as a 'flags' variable with bits for
GROUPING_CAN_USE_SORT, GROUPING_CAN_USE_HASH, and
GROUPING_CAN_PARTIAL_AGG. These are set in create_grouping_paths()
and passed down to create_ordinary_grouping_paths(). The idea is that
the flags value would be passed down to the partition-wise aggregate
code which in turn would call create_ordinary_grouping_paths() for the
child grouping relations, so that the relevant determinations are made
only at the top level. This patch also renames can_parallel_agg to
can_partial_agg and removes the parallelism-specific bits from it. To
compensate for this, create_ordinary_grouping_paths() now tests the
removed conditions instead. This is all good stuff for partition-wise
aggregate, since the grouped_rel->consider_parallel &&
input_rel->partial_pathlist != NIL conditions can vary on a per-child
basis but the rest of the stuff can't. In some subsequent patch, the
test should be pushed down inside create_partial_grouping_paths()
itself, so that this function can handle both partial and non-partial
paths as mentioned in the preceding paragraph.

0003 is a cleanup patch. It removes the grouping target as a separate
argument from a bunch of places that no longer need it given that 0001
sets grouped_rel->reltarget.

I think 0001 and 0002 together get us pretty close to having the stuff
that should be done for every child rel
(create_ordinary_grouping_paths) disentangled from the stuff that
should be done only once (create_grouping_paths). There are a couple
of exceptions:

- create_partial_grouping_paths() is still doing
get_agg_clause_costs() for the partial grouping target, which (I
think) only needs to be done once. Possibly we could handle that by
having create_grouping_paths() do that work whenever it sets
GROUPING_CAN_PARTIAL_AGG and pass the value downward. You might
complain that it won't get used unless either there are partial paths
available for the input rel OR partition-wise aggregate is used --
there's no point in partially aggregating a non-partial path at the
top level. We could just accept that as not a big deal, or maybe we
can figure out how to make it conditional so that we only do it when
either the input_rel has a partial path list or we have child rels.
Or we could do as you did in your patches and save it when we compute
it first, reusing it on each subsequent call. Or maybe there's some
other idea.

- These patches don't do anything about
create_partial_grouping_paths() and create_ordinary_grouping_paths()
directly referencing parse->targetList and parse->havingQual. I think
that we could add those as additional arguments to
create_ordinary_grouping_paths(). create_grouping_paths() would pass
the values taken from "parse" and partition-wise join would pass down
translated versions.

I am sort of unclear whether we need/want GroupPathExtraData at all.
What determines whether something gets passed via GroupPathExtraData
or just as a separate argument? If we have a rule that stuff that is
common to all child grouped rels goes in there and other stuff
doesn't, or stuff postgres_fdw needs goes in there and other stuff
doesn't, then that might be OK. But I'm not sure that there is such a
rule in the v20 patches.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0003-Don-t-pass-grouping-target-around-unnecessarily.patchapplication/octet-stream; name=0003-Don-t-pass-grouping-target-around-unnecessarily.patchDownload

From 821ada11ee49a7b28277d4480b4b6abc2d5ecad3 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 15 Mar 2018 16:51:48 -0400
Subject: [PATCH 3/3] Don't pass grouping target around unnecessarily.

---
 src/backend/optimizer/plan/planner.c  | 41 +++++++++++++----------------------
 src/backend/optimizer/util/pathnode.c |  4 ++--
 src/include/optimizer/pathnode.h      |  2 --
 3 files changed, 17 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a414f482d8..00e49d4353 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -163,10 +163,10 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
 static bool is_degenerate_grouping(PlannerInfo *root);
 static void create_degenerate_grouping_paths(PlannerInfo *root,
 								 RelOptInfo *input_rel,
-								 PathTarget *target, RelOptInfo *grouped_rel);
+								 RelOptInfo *grouped_rel);
 static void create_ordinary_grouping_paths(PlannerInfo *root,
 							   RelOptInfo *input_rel,
-							   PathTarget *target, RelOptInfo *grouped_rel,
+							   RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd, int flags);
 static void consider_groupingsets_paths(PlannerInfo *root,
@@ -174,7 +174,6 @@ static void consider_groupingsets_paths(PlannerInfo *root,
 							Path *path,
 							bool is_sorted,
 							bool can_hash,
-							PathTarget *target,
 							grouping_sets_data *gd,
 							const AggClauseCosts *agg_costs,
 							double dNumGroups);
@@ -220,7 +219,6 @@ static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 					  List *targets, List *targets_contain_srfs);
 static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  RelOptInfo *grouped_rel,
-						  PathTarget *target,
 						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
@@ -3736,7 +3734,7 @@ create_grouping_paths(PlannerInfo *root,
 	 * grouping, as appropriate.
 	 */
 	if (is_degenerate_grouping(root))
-		create_degenerate_grouping_paths(root, input_rel, target, grouped_rel);
+		create_degenerate_grouping_paths(root, input_rel, grouped_rel);
 	else
 	{
 		int			flags = 0;
@@ -3787,7 +3785,7 @@ create_grouping_paths(PlannerInfo *root,
 		if (can_partial_agg(root, agg_costs))
 			flags |= GROUPING_CAN_PARTIAL_AGG;
 
-		create_ordinary_grouping_paths(root, input_rel, target, grouped_rel,
+		create_ordinary_grouping_paths(root, input_rel, grouped_rel,
 									   agg_costs, gd, flags);
 	}
 
@@ -3825,7 +3823,7 @@ is_degenerate_grouping(PlannerInfo *root)
  */
 static void
 create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-								 PathTarget *target, RelOptInfo *grouped_rel)
+								 RelOptInfo *grouped_rel)
 {
 	Query	   *parse = root->parse;
 	int			nrows;
@@ -3847,7 +3845,7 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		{
 			path = (Path *)
 				create_result_path(root, grouped_rel,
-								   target,
+								   grouped_rel->reltarget,
 								   (List *) parse->havingQual);
 			paths = lappend(paths, path);
 		}
@@ -3860,14 +3858,13 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 							   false,
 							   NIL,
 							   -1);
-		path->pathtarget = target;
 	}
 	else
 	{
 		/* No grouping sets, or just one, so one output row */
 		path = (Path *)
 			create_result_path(root, grouped_rel,
-							   target,
+							   grouped_rel->reltarget,
 							   (List *) parse->havingQual);
 	}
 
@@ -3886,7 +3883,7 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
  */
 static void
 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-							   PathTarget *target, RelOptInfo *grouped_rel,
+							   RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd, int flags)
 {
@@ -3925,7 +3922,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
 
 	/* Build final grouping paths */
-	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+	add_paths_to_grouping_rel(root, input_rel, grouped_rel,
 							  partially_grouped_rel, agg_costs,
 							  &agg_final_costs, gd, can_sort, can_hash,
 							  dNumGroups, (List *) parse->havingQual);
@@ -3964,7 +3961,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 							Path *path,
 							bool is_sorted,
 							bool can_hash,
-							PathTarget *target,
 							grouping_sets_data *gd,
 							const AggClauseCosts *agg_costs,
 							double dNumGroups)
@@ -4106,7 +4102,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 				 create_groupingsets_path(root,
 										  grouped_rel,
 										  path,
-										  target,
 										  (List *) parse->havingQual,
 										  strat,
 										  new_rollups,
@@ -4264,7 +4259,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 					 create_groupingsets_path(root,
 											  grouped_rel,
 											  path,
-											  target,
 											  (List *) parse->havingQual,
 											  AGG_MIXED,
 											  rollups,
@@ -4281,7 +4275,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 				 create_groupingsets_path(root,
 										  grouped_rel,
 										  path,
-										  target,
 										  (List *) parse->havingQual,
 										  AGG_SORTED,
 										  gd->rollups,
@@ -6083,7 +6076,6 @@ get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
 static void
 add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  RelOptInfo *grouped_rel,
-						  PathTarget *target,
 						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
@@ -6121,7 +6113,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 				if (parse->groupingSets)
 				{
 					consider_groupingsets_paths(root, grouped_rel,
-												path, true, can_hash, target,
+												path, true, can_hash,
 												gd, agg_costs, dNumGroups);
 				}
 				else if (parse->hasAggs)
@@ -6134,7 +6126,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_agg_path(root,
 											 grouped_rel,
 											 path,
-											 target,
+											 grouped_rel->reltarget,
 											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
 											 AGGSPLIT_SIMPLE,
 											 parse->groupClause,
@@ -6152,7 +6144,6 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_group_path(root,
 											   grouped_rel,
 											   path,
-											   target,
 											   parse->groupClause,
 											   havingQual,
 											   dNumGroups));
@@ -6195,7 +6186,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_agg_path(root,
 											 grouped_rel,
 											 path,
-											 target,
+											 grouped_rel->reltarget,
 											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
 											 AGGSPLIT_FINAL_DESERIAL,
 											 parse->groupClause,
@@ -6207,7 +6198,6 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_group_path(root,
 											   grouped_rel,
 											   path,
-											   target,
 											   parse->groupClause,
 											   havingQual,
 											   dNumGroups));
@@ -6225,7 +6215,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 			 * Try for a hash-only groupingsets path over unsorted input.
 			 */
 			consider_groupingsets_paths(root, grouped_rel,
-										cheapest_path, false, true, target,
+										cheapest_path, false, true,
 										gd, agg_costs, dNumGroups);
 		}
 		else
@@ -6250,7 +6240,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 				add_path(grouped_rel, (Path *)
 						 create_agg_path(root, grouped_rel,
 										 cheapest_path,
-										 target,
+										 grouped_rel->reltarget,
 										 AGG_HASHED,
 										 AGGSPLIT_SIMPLE,
 										 parse->groupClause,
@@ -6278,7 +6268,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						 create_agg_path(root,
 										 grouped_rel,
 										 path,
-										 target,
+										 grouped_rel->reltarget,
 										 AGG_HASHED,
 										 AGGSPLIT_FINAL_DESERIAL,
 										 parse->groupClause,
@@ -6416,7 +6406,6 @@ create_partial_grouping_paths(PlannerInfo *root,
 									 create_group_path(root,
 													   partially_grouped_rel,
 													   path,
-													   partially_grouped_rel->reltarget,
 													   parse->groupClause,
 													   NIL,
 													   dNumPartialGroups));
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fe3b4582d4..22133fcf12 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2651,12 +2651,12 @@ GroupPath *
 create_group_path(PlannerInfo *root,
 				  RelOptInfo *rel,
 				  Path *subpath,
-				  PathTarget *target,
 				  List *groupClause,
 				  List *qual,
 				  double numGroups)
 {
 	GroupPath  *pathnode = makeNode(GroupPath);
+	PathTarget *target = rel->reltarget;
 
 	pathnode->path.pathtype = T_Group;
 	pathnode->path.parent = rel;
@@ -2828,7 +2828,6 @@ GroupingSetsPath *
 create_groupingsets_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 Path *subpath,
-						 PathTarget *target,
 						 List *having_qual,
 						 AggStrategy aggstrategy,
 						 List *rollups,
@@ -2836,6 +2835,7 @@ create_groupingsets_path(PlannerInfo *root,
 						 double numGroups)
 {
 	GroupingSetsPath *pathnode = makeNode(GroupingSetsPath);
+	PathTarget *target = rel->reltarget;
 	ListCell   *lc;
 	bool		is_first = true;
 	bool		is_first_sort = true;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index ef7173fbf8..381bc30813 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -178,7 +178,6 @@ extern SortPath *create_sort_path(PlannerInfo *root,
 extern GroupPath *create_group_path(PlannerInfo *root,
 				  RelOptInfo *rel,
 				  Path *subpath,
-				  PathTarget *target,
 				  List *groupClause,
 				  List *qual,
 				  double numGroups);
@@ -200,7 +199,6 @@ extern AggPath *create_agg_path(PlannerInfo *root,
 extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 Path *subpath,
-						 PathTarget *target,
 						 List *having_qual,
 						 AggStrategy aggstrategy,
 						 List *rollups,
-- 
2.14.3 (Apple Git-98)

0002-Pull-grouping-strategies-determination-up.patchapplication/octet-stream; name=0002-Pull-grouping-strategies-determination-up.patchDownload

From 2e1ad092508576d7f3b66f2a25eafcf437da6da3 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 15 Mar 2018 16:28:45 -0400
Subject: [PATCH 2/3] Pull grouping strategies determination up.

---
 src/backend/optimizer/plan/planner.c | 142 ++++++++++++++++++++---------------
 1 file changed, 82 insertions(+), 60 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0d7f2e7975..a414f482d8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -93,6 +93,25 @@ typedef struct
 	List	   *groupClause;	/* overrides parse->groupClause */
 } standard_qp_extra;
 
+/*
+ * Various flags indicating what kinds of grouping are possible.
+ *
+ * GROUPING_CAN_USE_SORT should be set if it's possible to perform
+ * sort-based implementations of grouping.  When grouping sets are in use,
+ * this will be true if sorting is potentially usable for any of the grouping
+ * sets, even if it's not usable for all of them.
+ *
+ * GROUPING_CAN_USE_HASH should be set if it's possible to perform
+ * hash-based implementations of grouping.
+ *
+ * GROUPING_CAN_PARTIAL_AGG should be set if the aggregation is of a type
+ * for which we support partial aggregation (not, for example, grouping sets).
+ * It says nothing about parallel-safety or the availability of suitable paths.
+ */
+#define GROUPING_CAN_USE_SORT       0x0001
+#define GROUPING_CAN_USE_HASH       0x0002
+#define GROUPING_CAN_PARTIAL_AGG	0x0004
+
 /*
  * Data specific to grouping sets
  */
@@ -149,7 +168,7 @@ static void create_ordinary_grouping_paths(PlannerInfo *root,
 							   RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
-							   grouping_sets_data *gd);
+							   grouping_sets_data *gd, int flags);
 static void consider_groupingsets_paths(PlannerInfo *root,
 							RelOptInfo *grouped_rel,
 							Path *path,
@@ -214,8 +233,8 @@ static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
 							  bool can_sort,
 							  bool can_hash,
 							  AggClauseCosts *agg_final_costs);
-static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
+static bool can_partial_agg(PlannerInfo *root,
+				const AggClauseCosts *agg_costs);
 
 
 /*****************************************************************************
@@ -3719,8 +3738,58 @@ create_grouping_paths(PlannerInfo *root,
 	if (is_degenerate_grouping(root))
 		create_degenerate_grouping_paths(root, input_rel, target, grouped_rel);
 	else
+	{
+		int			flags = 0;
+
+		/*
+		 * Determine whether it's possible to perform sort-based
+		 * implementations of grouping.  (Note that if groupClause is empty,
+		 * grouping_is_sortable() is trivially true, and all the
+		 * pathkeys_contained_in() tests will succeed too, so that we'll
+		 * consider every surviving input path.)
+		 *
+		 * If we have grouping sets, we might be able to sort some but not all
+		 * of them; in this case, we need can_sort to be true as long as we
+		 * must consider any sorted-input plan.
+		 */
+		if ((gd && gd->rollups != NIL)
+			|| grouping_is_sortable(parse->groupClause))
+			flags |= GROUPING_CAN_USE_SORT;
+
+		/*
+		 * Determine whether we should consider hash-based implementations of
+		 * grouping.
+		 *
+		 * Hashed aggregation only applies if we're grouping. If we have
+		 * grouping sets, some groups might be hashable but others not; in
+		 * this case we set can_hash true as long as there is nothing globally
+		 * preventing us from hashing (and we should therefore consider plans
+		 * with hashes).
+		 *
+		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+		 * BY aggregates.  (Doing so would imply storing *all* the input
+		 * values in the hash table, and/or running many sorts in parallel,
+		 * either of which seems like a certain loser.)  We similarly don't
+		 * support ordered-set aggregates in hashed aggregation, but that case
+		 * is also included in the numOrderedAggs count.
+		 *
+		 * Note: grouping_is_hashable() is much more expensive to check than
+		 * the other gating conditions, so we want to do it last.
+		 */
+		if ((parse->groupClause != NIL &&
+			 agg_costs->numOrderedAggs == 0 &&
+			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+			flags |= GROUPING_CAN_USE_HASH;
+
+		/*
+		 * Determine whether partial aggregation is possible.
+		 */
+		if (can_partial_agg(root, agg_costs))
+			flags |= GROUPING_CAN_PARTIAL_AGG;
+
 		create_ordinary_grouping_paths(root, input_rel, target, grouped_rel,
-									   agg_costs, gd);
+									   agg_costs, gd, flags);
+	}
 
 	set_cheapest(grouped_rel);
 	return grouped_rel;
@@ -3819,15 +3888,15 @@ static void
 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
-							   grouping_sets_data *gd)
+							   grouping_sets_data *gd, int flags)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *partially_grouped_rel = NULL;
 	AggClauseCosts agg_final_costs; /* parallel only */
 	double		dNumGroups;
-	bool		can_hash;
-	bool		can_sort;
+	bool		can_hash = (flags & GROUPING_CAN_USE_HASH) != 0;
+	bool		can_sort = (flags & GROUPING_CAN_USE_SORT) != 0;
 
 	/*
 	 * Estimate number of groups.
@@ -3837,50 +3906,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 									  gd,
 									  parse->targetList);
 
-	/*
-	 * Determine whether it's possible to perform sort-based implementations
-	 * of grouping.  (Note that if groupClause is empty,
-	 * grouping_is_sortable() is trivially true, and all the
-	 * pathkeys_contained_in() tests will succeed too, so that we'll consider
-	 * every surviving input path.)
-	 *
-	 * If we have grouping sets, we might be able to sort some but not all of
-	 * them; in this case, we need can_sort to be true as long as we must
-	 * consider any sorted-input plan.
-	 */
-	can_sort = (gd && gd->rollups != NIL)
-		|| grouping_is_sortable(parse->groupClause);
-
-	/*
-	 * Determine whether we should consider hash-based implementations of
-	 * grouping.
-	 *
-	 * Hashed aggregation only applies if we're grouping. If we have grouping
-	 * sets, some groups might be hashable but others not; in this case we set
-	 * can_hash true as long as there is nothing globally preventing us from
-	 * hashing (and we should therefore consider plans with hashes).
-	 *
-	 * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
-	 * aggregates.  (Doing so would imply storing *all* the input values in
-	 * the hash table, and/or running many sorts in parallel, either of which
-	 * seems like a certain loser.)  We similarly don't support ordered-set
-	 * aggregates in hashed aggregation, but that case is also included in the
-	 * numOrderedAggs count.
-	 *
-	 * Note: grouping_is_hashable() is much more expensive to check than the
-	 * other gating conditions, so we want to do it last.
-	 */
-	can_hash = (parse->groupClause != NIL &&
-				agg_costs->numOrderedAggs == 0 &&
-				(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)));
-
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
 	 * partially grouped paths; that way, later code can easily consider both
 	 * parallel and non-parallel approaches to grouping.
 	 */
 	MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
-	if (can_parallel_agg(root, input_rel, grouped_rel, agg_costs))
+	if (grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL
+		&& (flags & GROUPING_CAN_PARTIAL_AGG) != 0)
 		partially_grouped_rel =
 			create_partial_grouping_paths(root,
 										  grouped_rel,
@@ -6480,28 +6513,17 @@ create_partial_grouping_paths(PlannerInfo *root,
 }
 
 /*
- * can_parallel_agg
+ * can_partial_agg
  *
- * Determines whether or not parallel grouping and/or aggregation is possible.
+ * Determines whether or not partial grouping and/or aggregation is possible.
  * Returns true when possible, false otherwise.
  */
 static bool
-can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs)
+can_partial_agg(PlannerInfo *root, const AggClauseCosts *agg_costs)
 {
 	Query	   *parse = root->parse;
 
-	if (!grouped_rel->consider_parallel)
-	{
-		/* Not even parallel-safe. */
-		return false;
-	}
-	else if (input_rel->partial_pathlist == NIL)
-	{
-		/* Nothing to use as input for partial aggregate. */
-		return false;
-	}
-	else if (!parse->hasAggs && parse->groupClause == NIL)
+	if (!parse->hasAggs && parse->groupClause == NIL)
 	{
 		/*
 		 * We don't know how to do parallel aggregation unless we have either
-- 
2.14.3 (Apple Git-98)

0001-Push-creation-of-partially_grouped_rel-down.patchapplication/octet-stream; name=0001-Push-creation-of-partially_grouped_rel-down.patchDownload

From fcb66b8e5610ad927748588f9abebf96bc222229 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 15 Mar 2018 16:08:19 -0400
Subject: [PATCH 1/3] Push creation of partially_grouped_rel down.

---
 src/backend/optimizer/plan/planner.c | 272 ++++++++++++++++++-----------------
 1 file changed, 139 insertions(+), 133 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9c4a1baf5f..0d7f2e7975 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -148,7 +148,6 @@ static void create_degenerate_grouping_paths(PlannerInfo *root,
 static void create_ordinary_grouping_paths(PlannerInfo *root,
 							   RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
-							   RelOptInfo *partially_grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd);
 static void consider_groupingsets_paths(PlannerInfo *root,
@@ -208,13 +207,13 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  const AggClauseCosts *agg_final_costs,
 						  grouping_sets_data *gd, bool can_sort, bool can_hash,
 						  double dNumGroups, List *havingQual);
-static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
-								  RelOptInfo *input_rel,
-								  RelOptInfo *partially_grouped_rel,
-								  AggClauseCosts *agg_partial_costs,
-								  grouping_sets_data *gd,
-								  bool can_sort,
-								  bool can_hash);
+static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
+							  RelOptInfo *grouped_rel,
+							  RelOptInfo *input_rel,
+							  grouping_sets_data *gd,
+							  bool can_sort,
+							  bool can_hash,
+							  AggClauseCosts *agg_final_costs);
 static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
 
@@ -3688,42 +3687,30 @@ create_grouping_paths(PlannerInfo *root,
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
-	RelOptInfo *partially_grouped_rel;
 
 	/*
 	 * For now, all aggregated paths are added to the (GROUP_AGG, NULL)
-	 * upperrel.  Paths that are only partially aggregated go into the
-	 * (UPPERREL_PARTIAL_GROUP_AGG, NULL) upperrel.
+	 * upperrel.
 	 */
 	grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
-	partially_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
-											NULL);
+	grouped_rel->reltarget = target;
 
 	/*
 	 * If the input relation is not parallel-safe, then the grouped relation
 	 * can't be parallel-safe, either.  Otherwise, it's parallel-safe if the
-	 * target list and HAVING quals are parallel-safe.  The partially grouped
-	 * relation obeys the same rules.
+	 * target list and HAVING quals are parallel-safe.
 	 */
 	if (input_rel->consider_parallel && target_parallel_safe &&
 		is_parallel_safe(root, (Node *) parse->havingQual))
-	{
 		grouped_rel->consider_parallel = true;
-		partially_grouped_rel->consider_parallel = true;
-	}
 
 	/*
-	 * If the input rel belongs to a single FDW, so does the grouped rel. Same
-	 * for the partially_grouped_rel.
+	 * If the input rel belongs to a single FDW, so does the grouped rel.
 	 */
 	grouped_rel->serverid = input_rel->serverid;
 	grouped_rel->userid = input_rel->userid;
 	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 	grouped_rel->fdwroutine = input_rel->fdwroutine;
-	partially_grouped_rel->serverid = input_rel->serverid;
-	partially_grouped_rel->userid = input_rel->userid;
-	partially_grouped_rel->useridiscurrent = input_rel->useridiscurrent;
-	partially_grouped_rel->fdwroutine = input_rel->fdwroutine;
 
 	/*
 	 * Create either paths for a degenerate grouping or paths for ordinary
@@ -3733,7 +3720,7 @@ create_grouping_paths(PlannerInfo *root,
 		create_degenerate_grouping_paths(root, input_rel, target, grouped_rel);
 	else
 		create_ordinary_grouping_paths(root, input_rel, target, grouped_rel,
-									   partially_grouped_rel, agg_costs, gd);
+									   agg_costs, gd);
 
 	set_cheapest(grouped_rel);
 	return grouped_rel;
@@ -3831,18 +3818,16 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 static void
 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
-							   RelOptInfo *partially_grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
-	AggClauseCosts agg_partial_costs;	/* parallel only */
+	RelOptInfo *partially_grouped_rel = NULL;
 	AggClauseCosts agg_final_costs; /* parallel only */
 	double		dNumGroups;
 	bool		can_hash;
 	bool		can_sort;
-	bool		try_parallel_aggregation;
 
 	/*
 	 * Estimate number of groups.
@@ -3889,60 +3874,22 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 				agg_costs->numOrderedAggs == 0 &&
 				(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)));
 
-	/*
-	 * Figure out whether a PartialAggregate/Finalize Aggregate execution
-	 * strategy is viable.
-	 */
-	try_parallel_aggregation = can_parallel_agg(root, input_rel, grouped_rel,
-												agg_costs);
-
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
-	 * partial paths for partially_grouped_rel; that way, later code can
-	 * easily consider both parallel and non-parallel approaches to grouping.
+	 * partially grouped paths; that way, later code can easily consider both
+	 * parallel and non-parallel approaches to grouping.
 	 */
-	if (try_parallel_aggregation)
-	{
-		PathTarget *partial_grouping_target;
-
-		/*
-		 * Build target list for partial aggregate paths.  These paths cannot
-		 * just emit the same tlist as regular aggregate paths, because (1) we
-		 * must include Vars and Aggrefs needed in HAVING, which might not
-		 * appear in the result tlist, and (2) the Aggrefs must be set in
-		 * partial mode.
-		 */
-		partial_grouping_target = make_partial_grouping_target(root, target,
-															   (Node *) parse->havingQual);
-		partially_grouped_rel->reltarget = partial_grouping_target;
-
-		/*
-		 * Collect statistics about aggregates for estimating costs of
-		 * performing aggregation in parallel.
-		 */
-		MemSet(&agg_partial_costs, 0, sizeof(AggClauseCosts));
-		MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
-		if (parse->hasAggs)
-		{
-			/* partial phase */
-			get_agg_clause_costs(root, (Node *) partial_grouping_target->exprs,
-								 AGGSPLIT_INITIAL_SERIAL,
-								 &agg_partial_costs);
-
-			/* final phase */
-			get_agg_clause_costs(root, (Node *) target->exprs,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
-			get_agg_clause_costs(root, parse->havingQual,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
-		}
+	MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
+	if (can_parallel_agg(root, input_rel, grouped_rel, agg_costs))
+		partially_grouped_rel =
+			create_partial_grouping_paths(root,
+										  grouped_rel,
+										  input_rel,
+										  gd,
+										  can_sort,
+										  can_hash,
+										  &agg_final_costs);
 
-		add_paths_to_partial_grouping_rel(root, input_rel,
-										  partially_grouped_rel,
-										  &agg_partial_costs,
-										  gd, can_sort, can_hash);
-	}
 
 	/* Build final grouping paths */
 	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
@@ -6189,46 +6136,49 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 * Instead of operating directly on the input relation, we can
 		 * consider finalizing a partially aggregated path.
 		 */
-		foreach(lc, partially_grouped_rel->pathlist)
+		if (partially_grouped_rel != NULL)
 		{
-			Path	   *path = (Path *) lfirst(lc);
-
-			/*
-			 * Insert a Sort node, if required.  But there's no point in
-			 * sorting anything but the cheapest path.
-			 */
-			if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+			foreach(lc, partially_grouped_rel->pathlist)
 			{
-				if (path != partially_grouped_rel->cheapest_total_path)
-					continue;
-				path = (Path *) create_sort_path(root,
-												 grouped_rel,
-												 path,
-												 root->group_pathkeys,
-												 -1.0);
-			}
+				Path	   *path = (Path *) lfirst(lc);
 
-			if (parse->hasAggs)
-				add_path(grouped_rel, (Path *)
-						 create_agg_path(root,
-										 grouped_rel,
-										 path,
-										 target,
-										 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
-										 AGGSPLIT_FINAL_DESERIAL,
-										 parse->groupClause,
-										 havingQual,
-										 agg_final_costs,
-										 dNumGroups));
-			else
-				add_path(grouped_rel, (Path *)
-						 create_group_path(root,
-										   grouped_rel,
-										   path,
-										   target,
-										   parse->groupClause,
-										   havingQual,
-										   dNumGroups));
+				/*
+				 * Insert a Sort node, if required.  But there's no point in
+				 * sorting anything but the cheapest path.
+				 */
+				if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+				{
+					if (path != partially_grouped_rel->cheapest_total_path)
+						continue;
+					path = (Path *) create_sort_path(root,
+													 grouped_rel,
+													 path,
+													 root->group_pathkeys,
+													 -1.0);
+				}
+
+				if (parse->hasAggs)
+					add_path(grouped_rel, (Path *)
+							 create_agg_path(root,
+											 grouped_rel,
+											 path,
+											 target,
+											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+											 AGGSPLIT_FINAL_DESERIAL,
+											 parse->groupClause,
+											 havingQual,
+											 agg_final_costs,
+											 dNumGroups));
+				else
+					add_path(grouped_rel, (Path *)
+							 create_group_path(root,
+											   grouped_rel,
+											   path,
+											   target,
+											   parse->groupClause,
+											   havingQual,
+											   dNumGroups));
+			}
 		}
 	}
 
@@ -6279,10 +6229,10 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 
 		/*
 		 * Generate a Finalize HashAgg Path atop of the cheapest partially
-		 * grouped path. Once again, we'll only do this if it looks as though
-		 * the hash table won't exceed work_mem.
+		 * grouped path, assuming there is one. Once again, we'll only do this
+		 * if it looks as though the hash table won't exceed work_mem.
 		 */
-		if (partially_grouped_rel->pathlist)
+		if (partially_grouped_rel && partially_grouped_rel->pathlist)
 		{
 			Path	   *path = partially_grouped_rel->cheapest_total_path;
 
@@ -6307,29 +6257,83 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 }
 
 /*
- * add_paths_to_partial_grouping_rel
+ * create_partial_grouping_paths
  *
+ * Create a new upper relation representing the result of partial aggregation
+ * and populate it with appropriate paths.  This is a two-step process.
  * First, generate partially aggregated partial paths from the partial paths
  * for the input relation, and then generate partially aggregated non-partial
- * paths using Gather or Gather Merge.  All paths for this relation -- both
- * partial and non-partial -- have been partially aggregated but require a
- * subsequent FinalizeAggregate step.
+ * paths using Gather or Gather Merge.
+ *
+ * All paths for this new upper relation -- both partial and non-partial --
+ * have been partially aggregated but require a subsequent FinalizeAggregate
+ * step.
  */
-static void
-add_paths_to_partial_grouping_rel(PlannerInfo *root,
-								  RelOptInfo *input_rel,
-								  RelOptInfo *partially_grouped_rel,
-								  AggClauseCosts *agg_partial_costs,
-								  grouping_sets_data *gd,
-								  bool can_sort,
-								  bool can_hash)
+static RelOptInfo *
+create_partial_grouping_paths(PlannerInfo *root,
+							  RelOptInfo *grouped_rel,
+							  RelOptInfo *input_rel,
+							  grouping_sets_data *gd,
+							  bool can_sort,
+							  bool can_hash,
+							  AggClauseCosts *agg_final_costs)
 {
 	Query	   *parse = root->parse;
+	RelOptInfo *partially_grouped_rel;
+	AggClauseCosts agg_partial_costs;
 	Path	   *cheapest_partial_path = linitial(input_rel->partial_pathlist);
 	Size		hashaggtablesize;
 	double		dNumPartialGroups = 0;
 	ListCell   *lc;
 
+	/*
+	 * Build a new upper relation to represent the result of partially
+	 * aggregating the rows from the input relation.
+	 */
+	partially_grouped_rel = fetch_upper_rel(root,
+											UPPERREL_PARTIAL_GROUP_AGG,
+											grouped_rel->relids);
+	partially_grouped_rel->consider_parallel =
+		grouped_rel->consider_parallel;
+	partially_grouped_rel->serverid = grouped_rel->serverid;
+	partially_grouped_rel->userid = grouped_rel->userid;
+	partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
+	partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+
+	/*
+	 * Build target list for partial aggregate paths.  These paths cannot just
+	 * emit the same tlist as regular aggregate paths, because (1) we must
+	 * include Vars and Aggrefs needed in HAVING, which might not appear in
+	 * the result tlist, and (2) the Aggrefs must be set in partial mode.
+	 */
+	partially_grouped_rel->reltarget =
+		make_partial_grouping_target(root, grouped_rel->reltarget,
+									 (Node *) parse->havingQual);
+
+	/*
+	 * Collect statistics about aggregates for estimating costs of performing
+	 * aggregation in parallel.
+	 */
+	MemSet(&agg_partial_costs, 0, sizeof(AggClauseCosts));
+	if (parse->hasAggs)
+	{
+		List	   *partial_target_exprs;
+
+		/* partial phase */
+		partial_target_exprs = partially_grouped_rel->reltarget->exprs;
+		get_agg_clause_costs(root, (Node *) partial_target_exprs,
+							 AGGSPLIT_INITIAL_SERIAL,
+							 &agg_partial_costs);
+
+		/* final phase */
+		get_agg_clause_costs(root, (Node *) grouped_rel->reltarget->exprs,
+							 AGGSPLIT_FINAL_DESERIAL,
+							 agg_final_costs);
+		get_agg_clause_costs(root, parse->havingQual,
+							 AGGSPLIT_FINAL_DESERIAL,
+							 agg_final_costs);
+	}
+
 	/* Estimate number of partial groups. */
 	dNumPartialGroups = get_number_of_groups(root,
 											 cheapest_partial_path->rows,
@@ -6372,7 +6376,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 													 AGGSPLIT_INITIAL_SERIAL,
 													 parse->groupClause,
 													 NIL,
-													 agg_partial_costs,
+													 &agg_partial_costs,
 													 dNumPartialGroups));
 				else
 					add_partial_path(partially_grouped_rel, (Path *)
@@ -6394,7 +6398,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 
 		hashaggtablesize =
 			estimate_hashagg_tablesize(cheapest_partial_path,
-									   agg_partial_costs,
+									   &agg_partial_costs,
 									   dNumPartialGroups);
 
 		/*
@@ -6412,7 +6416,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 											 AGGSPLIT_INITIAL_SERIAL,
 											 parse->groupClause,
 											 NIL,
-											 agg_partial_costs,
+											 &agg_partial_costs,
 											 dNumPartialGroups));
 		}
 	}
@@ -6471,6 +6475,8 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 
 	/* Now choose the best path(s) */
 	set_cheapest(partially_grouped_rel);
+
+	return partially_grouped_rel;
 }
 
 /*
-- 
2.14.3 (Apple Git-98)

#124

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#120)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 15, 2018 at 7:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 15, 2018 at 6:08 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

In current create_grouping_paths() (without any of your patches
applied) we first create partial paths in partially grouped rel and
then add parallel path to grouped rel using those partial paths. Then
we hand over this to FDW and extension hooks, which may add partial
paths, which might throw away a partial path used to create a parallel
path in grouped rel causing a segfault. I think this bug exists since
we introduced parallel aggregation or upper relation refactoring
whichever happened later. Introduction of partially grouped rel has
just made it visible.

I don't think there's really a problem here; or at least if there is,
I'm not seeing it. If an FDW wants to introduce partially grouped
paths, it should do so when it is called for
UPPERREL_PARTIAL_GROUP_AGG from within
add_paths_to_partial_grouping_rel. If it wants to introduce fully
grouped paths, it should do so when it is called for
UPPERREL_GROUP_AGG from within create_grouping_paths. By the time the
latter call is made, it's too late to add partially grouped paths; if
the FDW does, that's a bug in the FDW.

Right.

Admittedly, this means that commit
3bf05e096b9f8375e640c5d7996aa57efd7f240c subtly changed the API
contract for FDWs. Before that, an FDW that wanted to support partial
aggregation would have needed to add partially grouped paths to
UPPERREL_GROUP_AGG when called for that relation; whereas now it would
need to add them to UPPERREL_PARTIAL_GROUP_AGG when called for that
relation.

And when an FDW added partial paths to the grouped rel
(UPPERREL_GROUP_AGG) it might throw away the partial paths gathered by
the core code. 3bf05e096b9f8375e640c5d7996aa57efd7f240c has fixed that
bug by allowing FDW to add partial paths to UPPERREL_PARTIAL_GROUP_AGG
before adding gather paths to the UPPERREL_GROUP_AGG. But I agree that
it has subtly changed that API and we need to add this to the
documentation (may be we should have added that in the commit which
introduced partially grouped relation).

This doesn't actually falsify any documentation, though,
because this oddity wasn't documented before. Possibly more
documentation could stand to be written in this area, but that's not
the topic of this thread.

+1.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#125

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#121)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 15, 2018 at 9:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 15, 2018 at 9:46 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Hmm.. you are right. Done.

I don't see a reason to hold off on committing 0002 and 0003, so I've
done that now; since they are closely related changes, I pushed them
as a single commit. It probably could've just been included in the
main patch, but it's fine.

Thanks.

I don't much like the code that 0001 refactors and am not keen to
propagate it into more places. I've separately proposed patches to
restructure that code in
/messages/by-id/CA+TgmoakT5gmahbPWGqrR2nAdFOMAOnOXYoWHRdVfGWs34t6_A@mail.gmail.com
and if we end up deciding to adopt that approach then I think this
patch will also need to create rels for UPPERREL_TLIST. I suspect
that approach would also remove the need for 0004, as that case would
also end up being handled in a different way. However, the jury is
still out on whether or not the approach I've proposed there is any
good. Feel free to opine over on that thread.

Will take a look at that patch next week.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#126

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#122)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Mar 16, 2018 at 12:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:

- partial_costs_set. The comment in compute_group_path_extra_data
doesn't look good. It says "Set partial aggregation costs if we are
going to calculate partial aggregates in make_grouping_rels()", but
what it means is something more like "agg_partial_costs and
agg_final_costs are not valid yet". I also wonder if there's a way
that we can figure out in advance whether we're going to need to do
this and just do it at the appropriate place in the code, as opposed
to doing it lazily. Even if there were rare cases where we did it
unnecessarily I'm not sure that would be any big deal.

I struggled with this one. In case of multi-level partitioning we will
compute fully aggregated results per partition in the levels for which
the partition keys are covered by GROUP BY clause. But beyond those
levels we will compute partial aggregates. When none of the levels can
use parallelism, only those levels which compute the partial
aggregates need these costs to be calculated. We will need to traverse
partitioning hierarchy before even starting aggregation to decide
whether we will need partial aggregation downwards. Instead of adding
that walker, I thought it better to use this flag. But more on this in
reply to your next mail.

I wonder if we could simplify things by copying more information from
the parent grouping rel to the child grouping rels. It seems to me
for example that the way you're computing consider_parallel for the
child relations is kind of pointless. The parallel-safety of the
grouping_target can't vary across children, nor can that of the
havingQual; the only thing that can change is whether the input rel
has consider_parallel set. You could cater to that by having
GroupPathExtraData do something like extra.grouping_is_parallel_safe =
target_parallel_safe && is_parallel_safe(root, havingQual) and then
set each child's consider parallel flag to
input_rel->consider_parallel && extra.grouping_is_parallel_safe.

I am actually confused by the current code itself. What parallel
workers compute is partial target which is different from the full
target. The partial target would only contain expressions in the
GROUPBY clause and partial aggregate nodes. It will not contain any
expressions comprising of full aggregates. When partial aggregate
nodes are parallel safe but the expressions using the full aggregates
are not parallel safe, the current code will not allow parallel
aggregation to take place whereas it should. That looks like an
optimization we are missing today.

That bug aside, the point is the target of grouped relation and that
of the partially grouped relation are different, giving rise to the
possibility that one is parallel safe and other is not. In fact, for
partially grouped relation, we shouldn't check parallel safety of
havingQual since havingQual is only applicable over the fully
aggregated result. That seems to be another missing optimization. OR I
am missing something here.

In case of partition-wise aggregates some levels may be performing
only partial aggregation and if partial aggregation is parallel safe,
we should allow those levels to run parallel partial aggregation.
Checking parallel safety of only grouped relation doesn't help here.
But since the optimization is already missing right now, I think this
patch shouldn't bother about it.

Similarly, right now the way the patch sets the reltargets for
grouping rels and partially grouping rels is a bit complex.
make_grouping_rels() calls make_partial_grouping_target() separately
for each partial grouping rel, but for non-partial grouping rels it
gets the translated tlist as an argument. Could we instead consider
always building the tlist by translation from the parent, that is, a
child grouped rel's tlist is the translation of the parent
grouped_rel's tlist, and the child partially grouped rel's tlist is
the translation of the parent partially_grouped_rel's tlist? If you
could both make that work and find a different place to compute the
partial agg costs, make_grouping_rels() would get a lot simpler or
perhaps go away entirely.

Hmm that's a thought. While we are translating, we allocate new nodes,
whereas make_partial_grouping_target() uses same nodes from the full
target. For a partially grouped child relation, this means that we
will allocate nodes to create partial target as well and then setrefs
will spend cycles matching node trees instead of matching pointers.
But I think we can take that hit if it saves us some complexity in the
code.

I don't like this condition which appears in that function:

if (extra->try_parallel_aggregation || force_partial_agg ||
(extra->partitionwise_grouping &&
extra->partial_partitionwise_grouping))

The problem with that is that it's got to exactly match the criteria
for whether we're going to need the partial_grouping_rel. If it's
true when we are not using partial paths, then you've missed an
optimization; in the reverse case, we'll probably crash or fail to
consider paths we should have considered.

It is not entirely
straightforward to verify that this test is correct.
add_paths_to_partial_grouping_rel() gets called if
extra->try_parallel_aggregation is true or if
extra->is_partial_aggregation is true, but the condition doesn't test
extra->is_partial_aggregation at all.

Why do we need to test extra->is_partial_aggregation? We are testing
force_partial_agg. I agree that we should probably test
is_partial_aggregation, but that doesn't make this condition wrong.

The other way that we can end up
using partially_grouped_rel is if create_partitionwise_grouping_paths
is called, but it just silently fails to do anything if we have no
partially_grouped_rel.

It will create partially_grouped_rel when partition-wise grouping
requires it. So, this sentence seems to contradict itself. I am
confused.

Moreover, even if it's
correct now, I think that the chances that the next person who
modifies this code will manage to keep it correct are not great. I
think we need to create the partial grouping rel somewhere in the code
that's closer to where it's actually needed, so that we don't have so
much action at a distance, or at least have a simpler and more
transparent set of tests.

+1. I agree with that.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#127

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#123)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Mar 16, 2018 at 3:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 15, 2018 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I wonder if we could simplify things by copying more information from
the parent grouping rel to the child grouping rels.

On further review, it seems like a better idea is to generate the
partial grouping relations from the grouping relations to which they
correspond. Attached is a series of proposed further refactoring
patches.

Ok. That looks good.

0001 moves the creation of partially_grouped_rel downwards. Instead
of happening in create_grouping_paths(), it gets moved downward to
add_paths_to_partial_grouping_rel(), which is renamed
create_partial_grouping_paths() and now returns a pointer to new
RelOptInfo. This seems like a better design than what we've got now:
it avoids creating the partially grouped relation if we don't need it,
and it looks more like the other upper planner functions
(create_grouping_paths, create_ordered_paths, etc.) which all create
and return a new relation.

I liked that.

0002 moves the determination of which grouping strategies are possible
upwards. It represents them as a 'flags' variable with bits for
GROUPING_CAN_USE_SORT, GROUPING_CAN_USE_HASH, and
GROUPING_CAN_PARTIAL_AGG. These are set in create_grouping_paths()
and passed down to create_ordinary_grouping_paths(). The idea is that
the flags value would be passed down to the partition-wise aggregate
code which in turn would call create_ordinary_grouping_paths() for the
child grouping relations, so that the relevant determinations are made
only at the top level.

+1.

This patch also renames can_parallel_agg to
can_partial_agg and removes the parallelism-specific bits from it.

I think we need to update the comments in this function to use phrase
"partial aggregation" instead of "parallel aggregation". And I think
we need to change the conditions as well. For example if
parse->groupClause == NIL, why can't we do partial aggregation? This
is the classical case when we will need patial aggregation. Probably
we should test this with Jeevan's patches for partition-wise aggregate
to see if it considers partition-wise aggregate or not.

OR When parse->groupingSets is true, I can see why we can't use
parallel query, but we can still compute partial aggregates. This
condition doesn't hurt since partition-wise aggregation bails out when
there are grouping sets, so it's not that harmful here.

To
compensate for this, create_ordinary_grouping_paths() now tests the
removed conditions instead. This is all good stuff for partition-wise
aggregate, since the grouped_rel->consider_parallel &&
input_rel->partial_pathlist != NIL conditions can vary on a per-child
basis but the rest of the stuff can't. In some subsequent patch, the
test should be pushed down inside create_partial_grouping_paths()
itself, so that this function can handle both partial and non-partial
paths as mentioned in the preceding paragraph.

I think can_parallel_agg() combines two conditions, whether partial
aggregation is possible and whether parallel aggregation is possible.
can_partial_agg() should have the first set and we should retain
can_parallel_agg() for the second set. We may then split
can_parallel_agg() into variant and invariant conditions i.e. the
conditions which change with input_rel and grouped_rel and those
don't.

- create_partial_grouping_paths() is still doing
get_agg_clause_costs() for the partial grouping target, which (I
think) only needs to be done once. Possibly we could handle that by
having create_grouping_paths() do that work whenever it sets
GROUPING_CAN_PARTIAL_AGG and pass the value downward. You might
complain that it won't get used unless either there are partial paths
available for the input rel OR partition-wise aggregate is used --
there's no point in partially aggregating a non-partial path at the
top level. We could just accept that as not a big deal, or maybe we
can figure out how to make it conditional so that we only do it when
either the input_rel has a partial path list or we have child rels.
Or we could do as you did in your patches and save it when we compute
it first, reusing it on each subsequent call. Or maybe there's some
other idea.

I am good with anything as long as we avoid repeated computation.

I am sort of unclear whether we need/want GroupPathExtraData at all.
What determines whether something gets passed via GroupPathExtraData
or just as a separate argument? If we have a rule that stuff that is
common to all child grouped rels goes in there and other stuff
doesn't, or stuff postgres_fdw needs goes in there and other stuff
doesn't, then that might be OK. But I'm not sure that there is such a
rule in the v20 patches.

We have a single FDW hook for all the upper relations and that hook
can not accept grouping specific arguments. Either we need a separate
FDW hook for grouping OR we need some way of passing upper relation
specific information down to an FDW. I think some FDWs and extensions
will be happy if we provide them readymade decisions for can_sort,
can_hash, can_partial_agg etc. It will be good if they don't have to
translate the grouping target and havingQual for every child twice,
once for core and second time in the FDW. In all it looks like we need
some structure to hold that information so that we can pass it down
the hook. I am fine with two structures one variable and other
invariable. An upper operation can have one of them or both.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#128

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#127)

3 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Ok. That looks good.

Here's an updated version. In this version, based on a voice
discussion with Ashutosh and Jeevan, I adjusted 0001 to combine it
with an earlier idea of splitting Gather/Gather Merge path generation
out of the function that creates partially aggregated paths. The idea
here is that create_ordinary_gather_paths() could first call
create_partial_grouping_paths(), then add additional paths which might
be partial or non-partial by invoking the partition-wise aggregate
logic, then call gather_grouping_paths() and set_cheapest() to
finalize the partially grouped rel. Also, I added draft commit
messages.

With this patch set applied, the key bit of logic in
create_ordinary_grouping_paths() ends up looking like this:

if (grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL
&& (flags & GROUPING_CAN_PARTIAL_AGG) != 0)
{
partially_grouped_rel =
create_partial_grouping_paths(root,
grouped_rel,
input_rel,
gd,
can_sort,
can_hash,
&agg_final_costs);
gather_grouping_paths(root, partially_grouped_rel);
set_cheapest(partially_grouped_rel);
}

I imagine that what the main partition-wise aggregate patch would do
is (1) change the conditions under which
create_partial_grouping_paths() gets called, (2) postpone
gather_grouping_paths() and set_cheapest() until after partition-wise
aggregate had been done, doing them only if partially_grouped_rel !=
NULL. Partition-wise aggregate will need to happen before
add_paths_to_grouping_rel(), though, so that the latter function can
try a FinalizeAggregate node on top of an Append added by
partition-wise aggregate.

This is a bit strange, because it will mean that partition-wise
aggregate will be attempted BEFORE adding ordinary aggregate paths to
grouped_rel but AFTER adding them to partially_grouped_rel. We could
fix that by splitting add_paths_to_grouping_rel() into two functions,
one of which performs full aggregation directly and the other of which
tries finishing partial aggregation. I'm unsure that's a good idea
though: it would mean that we have very similar logic in two different
functions that could get out of sync as a result of future code
changes, and it's not really fixing any problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0003-Don-t-pass-the-grouping-target-around-unnecessarily.patchapplication/octet-stream; name=0003-Don-t-pass-the-grouping-target-around-unnecessarily.patchDownload

From 814e12b87791121c894d8ac46c25ef33516f3242 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 15 Mar 2018 16:51:48 -0400
Subject: [PATCH 3/3] Don't pass the grouping target around unnecessarily.

Since the grouped upper relation now sets reltarget, a variety of other
functions can just get it from grouped_rel instead of having to pass it
around explicitly.  Simplify accordingly.

Patch by me, reviewed by Ashutosh Bapat.
---
 src/backend/optimizer/plan/planner.c  | 41 +++++++++++++----------------------
 src/backend/optimizer/util/pathnode.c |  4 ++--
 src/include/optimizer/pathnode.h      |  2 --
 3 files changed, 17 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7b623496e3..b452da0204 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -163,10 +163,10 @@ static RelOptInfo *create_grouping_paths(PlannerInfo *root,
 static bool is_degenerate_grouping(PlannerInfo *root);
 static void create_degenerate_grouping_paths(PlannerInfo *root,
 								 RelOptInfo *input_rel,
-								 PathTarget *target, RelOptInfo *grouped_rel);
+								 RelOptInfo *grouped_rel);
 static void create_ordinary_grouping_paths(PlannerInfo *root,
 							   RelOptInfo *input_rel,
-							   PathTarget *target, RelOptInfo *grouped_rel,
+							   RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd, int flags);
 static void consider_groupingsets_paths(PlannerInfo *root,
@@ -174,7 +174,6 @@ static void consider_groupingsets_paths(PlannerInfo *root,
 							Path *path,
 							bool is_sorted,
 							bool can_hash,
-							PathTarget *target,
 							grouping_sets_data *gd,
 							const AggClauseCosts *agg_costs,
 							double dNumGroups);
@@ -220,7 +219,6 @@ static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 					  List *targets, List *targets_contain_srfs);
 static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  RelOptInfo *grouped_rel,
-						  PathTarget *target,
 						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
@@ -3737,7 +3735,7 @@ create_grouping_paths(PlannerInfo *root,
 	 * grouping, as appropriate.
 	 */
 	if (is_degenerate_grouping(root))
-		create_degenerate_grouping_paths(root, input_rel, target, grouped_rel);
+		create_degenerate_grouping_paths(root, input_rel, grouped_rel);
 	else
 	{
 		int			flags = 0;
@@ -3788,7 +3786,7 @@ create_grouping_paths(PlannerInfo *root,
 		if (can_partial_agg(root, agg_costs))
 			flags |= GROUPING_CAN_PARTIAL_AGG;
 
-		create_ordinary_grouping_paths(root, input_rel, target, grouped_rel,
+		create_ordinary_grouping_paths(root, input_rel, grouped_rel,
 									   agg_costs, gd, flags);
 	}
 
@@ -3826,7 +3824,7 @@ is_degenerate_grouping(PlannerInfo *root)
  */
 static void
 create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-								 PathTarget *target, RelOptInfo *grouped_rel)
+								 RelOptInfo *grouped_rel)
 {
 	Query	   *parse = root->parse;
 	int			nrows;
@@ -3848,7 +3846,7 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		{
 			path = (Path *)
 				create_result_path(root, grouped_rel,
-								   target,
+								   grouped_rel->reltarget,
 								   (List *) parse->havingQual);
 			paths = lappend(paths, path);
 		}
@@ -3861,14 +3859,13 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 							   false,
 							   NIL,
 							   -1);
-		path->pathtarget = target;
 	}
 	else
 	{
 		/* No grouping sets, or just one, so one output row */
 		path = (Path *)
 			create_result_path(root, grouped_rel,
-							   target,
+							   grouped_rel->reltarget,
 							   (List *) parse->havingQual);
 	}
 
@@ -3887,7 +3884,7 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
  */
 static void
 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-							   PathTarget *target, RelOptInfo *grouped_rel,
+							   RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd, int flags)
 {
@@ -3929,7 +3926,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	}
 
 	/* Build final grouping paths */
-	add_paths_to_grouping_rel(root, input_rel, grouped_rel, target,
+	add_paths_to_grouping_rel(root, input_rel, grouped_rel,
 							  partially_grouped_rel, agg_costs,
 							  &agg_final_costs, gd, can_sort, can_hash,
 							  dNumGroups, (List *) parse->havingQual);
@@ -3968,7 +3965,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 							Path *path,
 							bool is_sorted,
 							bool can_hash,
-							PathTarget *target,
 							grouping_sets_data *gd,
 							const AggClauseCosts *agg_costs,
 							double dNumGroups)
@@ -4110,7 +4106,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 				 create_groupingsets_path(root,
 										  grouped_rel,
 										  path,
-										  target,
 										  (List *) parse->havingQual,
 										  strat,
 										  new_rollups,
@@ -4268,7 +4263,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 					 create_groupingsets_path(root,
 											  grouped_rel,
 											  path,
-											  target,
 											  (List *) parse->havingQual,
 											  AGG_MIXED,
 											  rollups,
@@ -4285,7 +4279,6 @@ consider_groupingsets_paths(PlannerInfo *root,
 				 create_groupingsets_path(root,
 										  grouped_rel,
 										  path,
-										  target,
 										  (List *) parse->havingQual,
 										  AGG_SORTED,
 										  gd->rollups,
@@ -6087,7 +6080,6 @@ get_partitioned_child_rels_for_join(PlannerInfo *root, Relids join_relids)
 static void
 add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  RelOptInfo *grouped_rel,
-						  PathTarget *target,
 						  RelOptInfo *partially_grouped_rel,
 						  const AggClauseCosts *agg_costs,
 						  const AggClauseCosts *agg_final_costs,
@@ -6125,7 +6117,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 				if (parse->groupingSets)
 				{
 					consider_groupingsets_paths(root, grouped_rel,
-												path, true, can_hash, target,
+												path, true, can_hash,
 												gd, agg_costs, dNumGroups);
 				}
 				else if (parse->hasAggs)
@@ -6138,7 +6130,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_agg_path(root,
 											 grouped_rel,
 											 path,
-											 target,
+											 grouped_rel->reltarget,
 											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
 											 AGGSPLIT_SIMPLE,
 											 parse->groupClause,
@@ -6156,7 +6148,6 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_group_path(root,
 											   grouped_rel,
 											   path,
-											   target,
 											   parse->groupClause,
 											   havingQual,
 											   dNumGroups));
@@ -6199,7 +6190,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_agg_path(root,
 											 grouped_rel,
 											 path,
-											 target,
+											 grouped_rel->reltarget,
 											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
 											 AGGSPLIT_FINAL_DESERIAL,
 											 parse->groupClause,
@@ -6211,7 +6202,6 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 							 create_group_path(root,
 											   grouped_rel,
 											   path,
-											   target,
 											   parse->groupClause,
 											   havingQual,
 											   dNumGroups));
@@ -6229,7 +6219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 			 * Try for a hash-only groupingsets path over unsorted input.
 			 */
 			consider_groupingsets_paths(root, grouped_rel,
-										cheapest_path, false, true, target,
+										cheapest_path, false, true,
 										gd, agg_costs, dNumGroups);
 		}
 		else
@@ -6254,7 +6244,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 				add_path(grouped_rel, (Path *)
 						 create_agg_path(root, grouped_rel,
 										 cheapest_path,
-										 target,
+										 grouped_rel->reltarget,
 										 AGG_HASHED,
 										 AGGSPLIT_SIMPLE,
 										 parse->groupClause,
@@ -6282,7 +6272,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						 create_agg_path(root,
 										 grouped_rel,
 										 path,
-										 target,
+										 grouped_rel->reltarget,
 										 AGG_HASHED,
 										 AGGSPLIT_FINAL_DESERIAL,
 										 parse->groupClause,
@@ -6420,7 +6410,6 @@ create_partial_grouping_paths(PlannerInfo *root,
 									 create_group_path(root,
 													   partially_grouped_rel,
 													   path,
-													   partially_grouped_rel->reltarget,
 													   parse->groupClause,
 													   NIL,
 													   dNumPartialGroups));
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fe3b4582d4..22133fcf12 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2651,12 +2651,12 @@ GroupPath *
 create_group_path(PlannerInfo *root,
 				  RelOptInfo *rel,
 				  Path *subpath,
-				  PathTarget *target,
 				  List *groupClause,
 				  List *qual,
 				  double numGroups)
 {
 	GroupPath  *pathnode = makeNode(GroupPath);
+	PathTarget *target = rel->reltarget;
 
 	pathnode->path.pathtype = T_Group;
 	pathnode->path.parent = rel;
@@ -2828,7 +2828,6 @@ GroupingSetsPath *
 create_groupingsets_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 Path *subpath,
-						 PathTarget *target,
 						 List *having_qual,
 						 AggStrategy aggstrategy,
 						 List *rollups,
@@ -2836,6 +2835,7 @@ create_groupingsets_path(PlannerInfo *root,
 						 double numGroups)
 {
 	GroupingSetsPath *pathnode = makeNode(GroupingSetsPath);
+	PathTarget *target = rel->reltarget;
 	ListCell   *lc;
 	bool		is_first = true;
 	bool		is_first_sort = true;
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index ef7173fbf8..381bc30813 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -178,7 +178,6 @@ extern SortPath *create_sort_path(PlannerInfo *root,
 extern GroupPath *create_group_path(PlannerInfo *root,
 				  RelOptInfo *rel,
 				  Path *subpath,
-				  PathTarget *target,
 				  List *groupClause,
 				  List *qual,
 				  double numGroups);
@@ -200,7 +199,6 @@ extern AggPath *create_agg_path(PlannerInfo *root,
 extern GroupingSetsPath *create_groupingsets_path(PlannerInfo *root,
 						 RelOptInfo *rel,
 						 Path *subpath,
-						 PathTarget *target,
 						 List *having_qual,
 						 AggStrategy aggstrategy,
 						 List *rollups,
-- 
2.14.3 (Apple Git-98)

0002-Determine-grouping-strategies-in-create_grouping_pat.patchapplication/octet-stream; name=0002-Determine-grouping-strategies-in-create_grouping_pat.patchDownload

From 6588face643ea2dc7a7b3ac4fea8ffcb064ca29c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 15 Mar 2018 16:28:45 -0400
Subject: [PATCH 2/3] Determine grouping strategies in create_grouping_paths.

Partition-wise aggregate will call create_ordinary_grouping_paths
multiple times and we don't want to redo this work every time; have
the caller do it instead and pass the details down.

Patch by me, reviewed by Ashutosh Bapat.
---
 src/backend/optimizer/plan/planner.c | 142 ++++++++++++++++++++---------------
 1 file changed, 82 insertions(+), 60 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 59802423fc..7b623496e3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -93,6 +93,25 @@ typedef struct
 	List	   *groupClause;	/* overrides parse->groupClause */
 } standard_qp_extra;
 
+/*
+ * Various flags indicating what kinds of grouping are possible.
+ *
+ * GROUPING_CAN_USE_SORT should be set if it's possible to perform
+ * sort-based implementations of grouping.  When grouping sets are in use,
+ * this will be true if sorting is potentially usable for any of the grouping
+ * sets, even if it's not usable for all of them.
+ *
+ * GROUPING_CAN_USE_HASH should be set if it's possible to perform
+ * hash-based implementations of grouping.
+ *
+ * GROUPING_CAN_PARTIAL_AGG should be set if the aggregation is of a type
+ * for which we support partial aggregation (not, for example, grouping sets).
+ * It says nothing about parallel-safety or the availability of suitable paths.
+ */
+#define GROUPING_CAN_USE_SORT       0x0001
+#define GROUPING_CAN_USE_HASH       0x0002
+#define GROUPING_CAN_PARTIAL_AGG	0x0004
+
 /*
  * Data specific to grouping sets
  */
@@ -149,7 +168,7 @@ static void create_ordinary_grouping_paths(PlannerInfo *root,
 							   RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
-							   grouping_sets_data *gd);
+							   grouping_sets_data *gd, int flags);
 static void consider_groupingsets_paths(PlannerInfo *root,
 							RelOptInfo *grouped_rel,
 							Path *path,
@@ -215,8 +234,8 @@ static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
 							  bool can_hash,
 							  AggClauseCosts *agg_final_costs);
 static void gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel);
-static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
+static bool can_partial_agg(PlannerInfo *root,
+				const AggClauseCosts *agg_costs);
 
 
 /*****************************************************************************
@@ -3720,8 +3739,58 @@ create_grouping_paths(PlannerInfo *root,
 	if (is_degenerate_grouping(root))
 		create_degenerate_grouping_paths(root, input_rel, target, grouped_rel);
 	else
+	{
+		int			flags = 0;
+
+		/*
+		 * Determine whether it's possible to perform sort-based
+		 * implementations of grouping.  (Note that if groupClause is empty,
+		 * grouping_is_sortable() is trivially true, and all the
+		 * pathkeys_contained_in() tests will succeed too, so that we'll
+		 * consider every surviving input path.)
+		 *
+		 * If we have grouping sets, we might be able to sort some but not all
+		 * of them; in this case, we need can_sort to be true as long as we
+		 * must consider any sorted-input plan.
+		 */
+		if ((gd && gd->rollups != NIL)
+			|| grouping_is_sortable(parse->groupClause))
+			flags |= GROUPING_CAN_USE_SORT;
+
+		/*
+		 * Determine whether we should consider hash-based implementations of
+		 * grouping.
+		 *
+		 * Hashed aggregation only applies if we're grouping. If we have
+		 * grouping sets, some groups might be hashable but others not; in
+		 * this case we set can_hash true as long as there is nothing globally
+		 * preventing us from hashing (and we should therefore consider plans
+		 * with hashes).
+		 *
+		 * Executor doesn't support hashed aggregation with DISTINCT or ORDER
+		 * BY aggregates.  (Doing so would imply storing *all* the input
+		 * values in the hash table, and/or running many sorts in parallel,
+		 * either of which seems like a certain loser.)  We similarly don't
+		 * support ordered-set aggregates in hashed aggregation, but that case
+		 * is also included in the numOrderedAggs count.
+		 *
+		 * Note: grouping_is_hashable() is much more expensive to check than
+		 * the other gating conditions, so we want to do it last.
+		 */
+		if ((parse->groupClause != NIL &&
+			 agg_costs->numOrderedAggs == 0 &&
+			 (gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
+			flags |= GROUPING_CAN_USE_HASH;
+
+		/*
+		 * Determine whether partial aggregation is possible.
+		 */
+		if (can_partial_agg(root, agg_costs))
+			flags |= GROUPING_CAN_PARTIAL_AGG;
+
 		create_ordinary_grouping_paths(root, input_rel, target, grouped_rel,
-									   agg_costs, gd);
+									   agg_costs, gd, flags);
+	}
 
 	set_cheapest(grouped_rel);
 	return grouped_rel;
@@ -3820,15 +3889,15 @@ static void
 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
 							   const AggClauseCosts *agg_costs,
-							   grouping_sets_data *gd)
+							   grouping_sets_data *gd, int flags)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *partially_grouped_rel = NULL;
 	AggClauseCosts agg_final_costs; /* parallel only */
 	double		dNumGroups;
-	bool		can_hash;
-	bool		can_sort;
+	bool		can_hash = (flags & GROUPING_CAN_USE_HASH) != 0;
+	bool		can_sort = (flags & GROUPING_CAN_USE_SORT) != 0;
 
 	/*
 	 * Estimate number of groups.
@@ -3838,50 +3907,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 									  gd,
 									  parse->targetList);
 
-	/*
-	 * Determine whether it's possible to perform sort-based implementations
-	 * of grouping.  (Note that if groupClause is empty,
-	 * grouping_is_sortable() is trivially true, and all the
-	 * pathkeys_contained_in() tests will succeed too, so that we'll consider
-	 * every surviving input path.)
-	 *
-	 * If we have grouping sets, we might be able to sort some but not all of
-	 * them; in this case, we need can_sort to be true as long as we must
-	 * consider any sorted-input plan.
-	 */
-	can_sort = (gd && gd->rollups != NIL)
-		|| grouping_is_sortable(parse->groupClause);
-
-	/*
-	 * Determine whether we should consider hash-based implementations of
-	 * grouping.
-	 *
-	 * Hashed aggregation only applies if we're grouping. If we have grouping
-	 * sets, some groups might be hashable but others not; in this case we set
-	 * can_hash true as long as there is nothing globally preventing us from
-	 * hashing (and we should therefore consider plans with hashes).
-	 *
-	 * Executor doesn't support hashed aggregation with DISTINCT or ORDER BY
-	 * aggregates.  (Doing so would imply storing *all* the input values in
-	 * the hash table, and/or running many sorts in parallel, either of which
-	 * seems like a certain loser.)  We similarly don't support ordered-set
-	 * aggregates in hashed aggregation, but that case is also included in the
-	 * numOrderedAggs count.
-	 *
-	 * Note: grouping_is_hashable() is much more expensive to check than the
-	 * other gating conditions, so we want to do it last.
-	 */
-	can_hash = (parse->groupClause != NIL &&
-				agg_costs->numOrderedAggs == 0 &&
-				(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)));
-
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
 	 * partially grouped paths; that way, later code can easily consider both
 	 * parallel and non-parallel approaches to grouping.
 	 */
 	MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
-	if (can_parallel_agg(root, input_rel, grouped_rel, agg_costs))
+	if (grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL
+		&& (flags & GROUPING_CAN_PARTIAL_AGG) != 0)
 	{
 		partially_grouped_rel =
 			create_partial_grouping_paths(root,
@@ -6490,28 +6523,17 @@ gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
 }
 
 /*
- * can_parallel_agg
+ * can_partial_agg
  *
- * Determines whether or not parallel grouping and/or aggregation is possible.
+ * Determines whether or not partial grouping and/or aggregation is possible.
  * Returns true when possible, false otherwise.
  */
 static bool
-can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
-				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs)
+can_partial_agg(PlannerInfo *root, const AggClauseCosts *agg_costs)
 {
 	Query	   *parse = root->parse;
 
-	if (!grouped_rel->consider_parallel)
-	{
-		/* Not even parallel-safe. */
-		return false;
-	}
-	else if (input_rel->partial_pathlist == NIL)
-	{
-		/* Nothing to use as input for partial aggregate. */
-		return false;
-	}
-	else if (!parse->hasAggs && parse->groupClause == NIL)
+	if (!parse->hasAggs && parse->groupClause == NIL)
 	{
 		/*
 		 * We don't know how to do parallel aggregation unless we have either
-- 
2.14.3 (Apple Git-98)

0001-Defer-creation-of-partially-grouped-relation-until-i.patchapplication/octet-stream; name=0001-Defer-creation-of-partially-grouped-relation-until-i.patchDownload

From d4571873038eee9092e5edeab0be77bb106fe06c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 15 Mar 2018 16:08:19 -0400
Subject: [PATCH 1/3] Defer creation of partially-grouped relation until it's
 needed.

This avoids unnecessarily creating a RelOptInfo for which we have no
actual need.  This idea is from Ashutosh Bapat, who wrote a very
different patch to accomplish a similar goal.  It will be more
important if and when we get partition-wise aggregate, since then
there could be many partially grouped relations all of which could
potentially be unnecessary.  In passing, this sets the grouping
relation's reltarget, which wasn't done previously but makes things
simpler for this refactoring.

Along the way, adjust things so that add_paths_to_partial_grouping_rel,
now renamed create_partial_grouping_paths, does not perform the Gather
or Gather Merge steps to generate non-partial paths from partial
paths; have the caller do it instead.  This is again for the
convenience of partition-wise aggregate, which wants to inject
additional partial paths are created and before we decide which ones
to Gather/Gather Merge.  This might seem like a separate change, but
it's actually pretty closely entangled; I couldn't really see much
value in separating it and having to change some things twice.

Patch by me, reviewed by Ashutosh Bapat.
---
 src/backend/optimizer/plan/planner.c | 324 ++++++++++++++++++-----------------
 1 file changed, 170 insertions(+), 154 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9c4a1baf5f..59802423fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -148,7 +148,6 @@ static void create_degenerate_grouping_paths(PlannerInfo *root,
 static void create_ordinary_grouping_paths(PlannerInfo *root,
 							   RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
-							   RelOptInfo *partially_grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd);
 static void consider_groupingsets_paths(PlannerInfo *root,
@@ -208,13 +207,14 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  const AggClauseCosts *agg_final_costs,
 						  grouping_sets_data *gd, bool can_sort, bool can_hash,
 						  double dNumGroups, List *havingQual);
-static void add_paths_to_partial_grouping_rel(PlannerInfo *root,
-								  RelOptInfo *input_rel,
-								  RelOptInfo *partially_grouped_rel,
-								  AggClauseCosts *agg_partial_costs,
-								  grouping_sets_data *gd,
-								  bool can_sort,
-								  bool can_hash);
+static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
+							  RelOptInfo *grouped_rel,
+							  RelOptInfo *input_rel,
+							  grouping_sets_data *gd,
+							  bool can_sort,
+							  bool can_hash,
+							  AggClauseCosts *agg_final_costs);
+static void gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel);
 static bool can_parallel_agg(PlannerInfo *root, RelOptInfo *input_rel,
 				 RelOptInfo *grouped_rel, const AggClauseCosts *agg_costs);
 
@@ -3688,42 +3688,30 @@ create_grouping_paths(PlannerInfo *root,
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *grouped_rel;
-	RelOptInfo *partially_grouped_rel;
 
 	/*
 	 * For now, all aggregated paths are added to the (GROUP_AGG, NULL)
-	 * upperrel.  Paths that are only partially aggregated go into the
-	 * (UPPERREL_PARTIAL_GROUP_AGG, NULL) upperrel.
+	 * upperrel.
 	 */
 	grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, NULL);
-	partially_grouped_rel = fetch_upper_rel(root, UPPERREL_PARTIAL_GROUP_AGG,
-											NULL);
+	grouped_rel->reltarget = target;
 
 	/*
 	 * If the input relation is not parallel-safe, then the grouped relation
 	 * can't be parallel-safe, either.  Otherwise, it's parallel-safe if the
-	 * target list and HAVING quals are parallel-safe.  The partially grouped
-	 * relation obeys the same rules.
+	 * target list and HAVING quals are parallel-safe.
 	 */
 	if (input_rel->consider_parallel && target_parallel_safe &&
 		is_parallel_safe(root, (Node *) parse->havingQual))
-	{
 		grouped_rel->consider_parallel = true;
-		partially_grouped_rel->consider_parallel = true;
-	}
 
 	/*
-	 * If the input rel belongs to a single FDW, so does the grouped rel. Same
-	 * for the partially_grouped_rel.
+	 * If the input rel belongs to a single FDW, so does the grouped rel.
 	 */
 	grouped_rel->serverid = input_rel->serverid;
 	grouped_rel->userid = input_rel->userid;
 	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 	grouped_rel->fdwroutine = input_rel->fdwroutine;
-	partially_grouped_rel->serverid = input_rel->serverid;
-	partially_grouped_rel->userid = input_rel->userid;
-	partially_grouped_rel->useridiscurrent = input_rel->useridiscurrent;
-	partially_grouped_rel->fdwroutine = input_rel->fdwroutine;
 
 	/*
 	 * Create either paths for a degenerate grouping or paths for ordinary
@@ -3733,7 +3721,7 @@ create_grouping_paths(PlannerInfo *root,
 		create_degenerate_grouping_paths(root, input_rel, target, grouped_rel);
 	else
 		create_ordinary_grouping_paths(root, input_rel, target, grouped_rel,
-									   partially_grouped_rel, agg_costs, gd);
+									   agg_costs, gd);
 
 	set_cheapest(grouped_rel);
 	return grouped_rel;
@@ -3831,18 +3819,16 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 static void
 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 							   PathTarget *target, RelOptInfo *grouped_rel,
-							   RelOptInfo *partially_grouped_rel,
 							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd)
 {
 	Query	   *parse = root->parse;
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
-	AggClauseCosts agg_partial_costs;	/* parallel only */
+	RelOptInfo *partially_grouped_rel = NULL;
 	AggClauseCosts agg_final_costs; /* parallel only */
 	double		dNumGroups;
 	bool		can_hash;
 	bool		can_sort;
-	bool		try_parallel_aggregation;
 
 	/*
 	 * Estimate number of groups.
@@ -3889,59 +3875,24 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 				agg_costs->numOrderedAggs == 0 &&
 				(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause)));
 
-	/*
-	 * Figure out whether a PartialAggregate/Finalize Aggregate execution
-	 * strategy is viable.
-	 */
-	try_parallel_aggregation = can_parallel_agg(root, input_rel, grouped_rel,
-												agg_costs);
-
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
-	 * partial paths for partially_grouped_rel; that way, later code can
-	 * easily consider both parallel and non-parallel approaches to grouping.
+	 * partially grouped paths; that way, later code can easily consider both
+	 * parallel and non-parallel approaches to grouping.
 	 */
-	if (try_parallel_aggregation)
+	MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
+	if (can_parallel_agg(root, input_rel, grouped_rel, agg_costs))
 	{
-		PathTarget *partial_grouping_target;
-
-		/*
-		 * Build target list for partial aggregate paths.  These paths cannot
-		 * just emit the same tlist as regular aggregate paths, because (1) we
-		 * must include Vars and Aggrefs needed in HAVING, which might not
-		 * appear in the result tlist, and (2) the Aggrefs must be set in
-		 * partial mode.
-		 */
-		partial_grouping_target = make_partial_grouping_target(root, target,
-															   (Node *) parse->havingQual);
-		partially_grouped_rel->reltarget = partial_grouping_target;
-
-		/*
-		 * Collect statistics about aggregates for estimating costs of
-		 * performing aggregation in parallel.
-		 */
-		MemSet(&agg_partial_costs, 0, sizeof(AggClauseCosts));
-		MemSet(&agg_final_costs, 0, sizeof(AggClauseCosts));
-		if (parse->hasAggs)
-		{
-			/* partial phase */
-			get_agg_clause_costs(root, (Node *) partial_grouping_target->exprs,
-								 AGGSPLIT_INITIAL_SERIAL,
-								 &agg_partial_costs);
-
-			/* final phase */
-			get_agg_clause_costs(root, (Node *) target->exprs,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
-			get_agg_clause_costs(root, parse->havingQual,
-								 AGGSPLIT_FINAL_DESERIAL,
-								 &agg_final_costs);
-		}
-
-		add_paths_to_partial_grouping_rel(root, input_rel,
-										  partially_grouped_rel,
-										  &agg_partial_costs,
-										  gd, can_sort, can_hash);
+		partially_grouped_rel =
+			create_partial_grouping_paths(root,
+										  grouped_rel,
+										  input_rel,
+										  gd,
+										  can_sort,
+										  can_hash,
+										  &agg_final_costs);
+		gather_grouping_paths(root, partially_grouped_rel);
+		set_cheapest(partially_grouped_rel);
 	}
 
 	/* Build final grouping paths */
@@ -6189,46 +6140,49 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 		 * Instead of operating directly on the input relation, we can
 		 * consider finalizing a partially aggregated path.
 		 */
-		foreach(lc, partially_grouped_rel->pathlist)
+		if (partially_grouped_rel != NULL)
 		{
-			Path	   *path = (Path *) lfirst(lc);
-
-			/*
-			 * Insert a Sort node, if required.  But there's no point in
-			 * sorting anything but the cheapest path.
-			 */
-			if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+			foreach(lc, partially_grouped_rel->pathlist)
 			{
-				if (path != partially_grouped_rel->cheapest_total_path)
-					continue;
-				path = (Path *) create_sort_path(root,
-												 grouped_rel,
-												 path,
-												 root->group_pathkeys,
-												 -1.0);
-			}
+				Path	   *path = (Path *) lfirst(lc);
 
-			if (parse->hasAggs)
-				add_path(grouped_rel, (Path *)
-						 create_agg_path(root,
-										 grouped_rel,
-										 path,
-										 target,
-										 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
-										 AGGSPLIT_FINAL_DESERIAL,
-										 parse->groupClause,
-										 havingQual,
-										 agg_final_costs,
-										 dNumGroups));
-			else
-				add_path(grouped_rel, (Path *)
-						 create_group_path(root,
-										   grouped_rel,
-										   path,
-										   target,
-										   parse->groupClause,
-										   havingQual,
-										   dNumGroups));
+				/*
+				 * Insert a Sort node, if required.  But there's no point in
+				 * sorting anything but the cheapest path.
+				 */
+				if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+				{
+					if (path != partially_grouped_rel->cheapest_total_path)
+						continue;
+					path = (Path *) create_sort_path(root,
+													 grouped_rel,
+													 path,
+													 root->group_pathkeys,
+													 -1.0);
+				}
+
+				if (parse->hasAggs)
+					add_path(grouped_rel, (Path *)
+							 create_agg_path(root,
+											 grouped_rel,
+											 path,
+											 target,
+											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+											 AGGSPLIT_FINAL_DESERIAL,
+											 parse->groupClause,
+											 havingQual,
+											 agg_final_costs,
+											 dNumGroups));
+				else
+					add_path(grouped_rel, (Path *)
+							 create_group_path(root,
+											   grouped_rel,
+											   path,
+											   target,
+											   parse->groupClause,
+											   havingQual,
+											   dNumGroups));
+			}
 		}
 	}
 
@@ -6279,10 +6233,10 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 
 		/*
 		 * Generate a Finalize HashAgg Path atop of the cheapest partially
-		 * grouped path. Once again, we'll only do this if it looks as though
-		 * the hash table won't exceed work_mem.
+		 * grouped path, assuming there is one. Once again, we'll only do this
+		 * if it looks as though the hash table won't exceed work_mem.
 		 */
-		if (partially_grouped_rel->pathlist)
+		if (partially_grouped_rel && partially_grouped_rel->pathlist)
 		{
 			Path	   *path = partially_grouped_rel->cheapest_total_path;
 
@@ -6307,29 +6261,83 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 }
 
 /*
- * add_paths_to_partial_grouping_rel
+ * create_partial_grouping_paths
  *
- * First, generate partially aggregated partial paths from the partial paths
- * for the input relation, and then generate partially aggregated non-partial
- * paths using Gather or Gather Merge.  All paths for this relation -- both
- * partial and non-partial -- have been partially aggregated but require a
- * subsequent FinalizeAggregate step.
+ * Create a new upper relation representing the result of partial aggregation
+ * and populate it with appropriate paths.  Note that we don't finalize the
+ * lists of paths here, so the caller can add additional partial or non-partial
+ * paths and must afterward call gather_grouping_paths and set_cheapest on
+ * the returned upper relation.
+ *
+ * All paths for this new upper relation -- both partial and non-partial --
+ * have been partially aggregated but require a subsequent FinalizeAggregate
+ * step.
  */
-static void
-add_paths_to_partial_grouping_rel(PlannerInfo *root,
-								  RelOptInfo *input_rel,
-								  RelOptInfo *partially_grouped_rel,
-								  AggClauseCosts *agg_partial_costs,
-								  grouping_sets_data *gd,
-								  bool can_sort,
-								  bool can_hash)
+static RelOptInfo *
+create_partial_grouping_paths(PlannerInfo *root,
+							  RelOptInfo *grouped_rel,
+							  RelOptInfo *input_rel,
+							  grouping_sets_data *gd,
+							  bool can_sort,
+							  bool can_hash,
+							  AggClauseCosts *agg_final_costs)
 {
 	Query	   *parse = root->parse;
+	RelOptInfo *partially_grouped_rel;
+	AggClauseCosts agg_partial_costs;
 	Path	   *cheapest_partial_path = linitial(input_rel->partial_pathlist);
 	Size		hashaggtablesize;
 	double		dNumPartialGroups = 0;
 	ListCell   *lc;
 
+	/*
+	 * Build a new upper relation to represent the result of partially
+	 * aggregating the rows from the input relation.
+	 */
+	partially_grouped_rel = fetch_upper_rel(root,
+											UPPERREL_PARTIAL_GROUP_AGG,
+											grouped_rel->relids);
+	partially_grouped_rel->consider_parallel =
+		grouped_rel->consider_parallel;
+	partially_grouped_rel->serverid = grouped_rel->serverid;
+	partially_grouped_rel->userid = grouped_rel->userid;
+	partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
+	partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+
+	/*
+	 * Build target list for partial aggregate paths.  These paths cannot just
+	 * emit the same tlist as regular aggregate paths, because (1) we must
+	 * include Vars and Aggrefs needed in HAVING, which might not appear in
+	 * the result tlist, and (2) the Aggrefs must be set in partial mode.
+	 */
+	partially_grouped_rel->reltarget =
+		make_partial_grouping_target(root, grouped_rel->reltarget,
+									 (Node *) parse->havingQual);
+
+	/*
+	 * Collect statistics about aggregates for estimating costs of performing
+	 * aggregation in parallel.
+	 */
+	MemSet(&agg_partial_costs, 0, sizeof(AggClauseCosts));
+	if (parse->hasAggs)
+	{
+		List	   *partial_target_exprs;
+
+		/* partial phase */
+		partial_target_exprs = partially_grouped_rel->reltarget->exprs;
+		get_agg_clause_costs(root, (Node *) partial_target_exprs,
+							 AGGSPLIT_INITIAL_SERIAL,
+							 &agg_partial_costs);
+
+		/* final phase */
+		get_agg_clause_costs(root, (Node *) grouped_rel->reltarget->exprs,
+							 AGGSPLIT_FINAL_DESERIAL,
+							 agg_final_costs);
+		get_agg_clause_costs(root, parse->havingQual,
+							 AGGSPLIT_FINAL_DESERIAL,
+							 agg_final_costs);
+	}
+
 	/* Estimate number of partial groups. */
 	dNumPartialGroups = get_number_of_groups(root,
 											 cheapest_partial_path->rows,
@@ -6372,7 +6380,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 													 AGGSPLIT_INITIAL_SERIAL,
 													 parse->groupClause,
 													 NIL,
-													 agg_partial_costs,
+													 &agg_partial_costs,
 													 dNumPartialGroups));
 				else
 					add_partial_path(partially_grouped_rel, (Path *)
@@ -6394,7 +6402,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 
 		hashaggtablesize =
 			estimate_hashagg_tablesize(cheapest_partial_path,
-									   agg_partial_costs,
+									   &agg_partial_costs,
 									   dNumPartialGroups);
 
 		/*
@@ -6412,7 +6420,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 											 AGGSPLIT_INITIAL_SERIAL,
 											 parse->groupClause,
 											 NIL,
-											 agg_partial_costs,
+											 &agg_partial_costs,
 											 dNumPartialGroups));
 		}
 	}
@@ -6431,20 +6439,32 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 										 input_rel, partially_grouped_rel);
 	}
 
-	/*
-	 * Try adding Gather or Gather Merge to partial paths to produce
-	 * non-partial paths.
-	 */
-	generate_gather_paths(root, partially_grouped_rel, true);
+	return partially_grouped_rel;
+}
 
-	/* Get cheapest partial path from partially_grouped_rel */
-	cheapest_partial_path = linitial(partially_grouped_rel->partial_pathlist);
+/*
+ * Generate Gather and Gather Merge paths for a grouping relation or partial
+ * grouping relation.
+ *
+ * generate_gather_paths does most of the work, but we also consider a special
+ * case: we could try sorting the data by the group_pathkeys and then applying
+ * Gather Merge.
+ *
+ * NB: This function shouldn't be used for anything other than a grouped or
+ * partially grouped relation not only because of the fact that it explcitly
+ * references group_pathkeys but we pass "true" as the third argument to
+ * generate_gather_paths().
+ */
+static void
+gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+	Path	   *cheapest_partial_path;
 
-	/*
-	 * generate_gather_paths won't consider sorting the cheapest path to match
-	 * the group keys and then applying a Gather Merge node to the result;
-	 * that might be a winning strategy.
-	 */
+	/* Try Gather for unordered paths and Gather Merge for ordered ones. */
+	generate_gather_paths(root, rel, true);
+
+	/* Try cheapest partial path + explicit Sort + Gather Merge. */
+	cheapest_partial_path = linitial(rel->partial_pathlist);
 	if (!pathkeys_contained_in(root->group_pathkeys,
 							   cheapest_partial_path->pathkeys))
 	{
@@ -6453,24 +6473,20 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 
 		total_groups =
 			cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
-		path = (Path *) create_sort_path(root, partially_grouped_rel,
-										 cheapest_partial_path,
+		path = (Path *) create_sort_path(root, rel, cheapest_partial_path,
 										 root->group_pathkeys,
 										 -1.0);
 		path = (Path *)
 			create_gather_merge_path(root,
-									 partially_grouped_rel,
+									 rel,
 									 path,
-									 partially_grouped_rel->reltarget,
+									 rel->reltarget,
 									 root->group_pathkeys,
 									 NULL,
 									 &total_groups);
 
-		add_path(partially_grouped_rel, path);
+		add_path(rel, path);
 	}
-
-	/* Now choose the best path(s) */
-	set_cheapest(partially_grouped_rel);
 }
 
 /*
-- 
2.14.3 (Apple Git-98)

#129

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#127)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

This patch also renames can_parallel_agg to
can_partial_agg and removes the parallelism-specific bits from it.

I think we need to update the comments in this function to use phrase
"partial aggregation" instead of "parallel aggregation". And I think
we need to change the conditions as well. For example if
parse->groupClause == NIL, why can't we do partial aggregation? This
is the classical case when we will need patial aggregation. Probably
we should test this with Jeevan's patches for partition-wise aggregate
to see if it considers partition-wise aggregate or not.

I think the case where we have neither any aggregates nor a grouping
clause is where we are doing SELECT DISTINCT. Something like SELECT
COUNT(*) FROM ... is not this case because that has an aggregate.

I'm sort of on the fence as to whether and how to update the comments.
I agree that it's a little strange to leave the comments here
referencing parallel aggregation when the function has been renamed to
is_partial_agg(), but a simple search-and-replace doesn't necessarily
improve the situation very much. Most notably, hasNonSerial is
irrelevant for partial but non-parallel aggregation, but we still have
the test because we haven't done the work to really do the right thing
here, which is to separately track whether we can do parallel partial
aggregation (either hasNonPartial or hasNonSerial is a problem) and
non-parallel partial aggregation (only hasNonPartial is a problem).
This needs a deeper reassessment, but I don't think that can or should
be something we try to do right now.

OR When parse->groupingSets is true, I can see why we can't use
parallel query, but we can still compute partial aggregates. This
condition doesn't hurt since partition-wise aggregation bails out when
there are grouping sets, so it's not that harmful here.

I haven't thought deeply about what will break when GROUPING SETS are
in use, but it's not the purpose of this patch set to make them work
where they didn't before. The point of hoisting the first two tests
out of this function was just to avoid doing repeated work when
partition-wise aggregate is in use. Those two tests could conceivably
produce different results for different children, whereas the
remaining tests won't give different answers. Let's not get
distracted by the prospect of improving the tests. I suspect that's
not anyway so simple to achieve as what you seem to be speculating
here...

I am sort of unclear whether we need/want GroupPathExtraData at all.
What determines whether something gets passed via GroupPathExtraData
or just as a separate argument? If we have a rule that stuff that is
common to all child grouped rels goes in there and other stuff
doesn't, or stuff postgres_fdw needs goes in there and other stuff
doesn't, then that might be OK. But I'm not sure that there is such a
rule in the v20 patches.

We have a single FDW hook for all the upper relations and that hook
can not accept grouping specific arguments. Either we need a separate
FDW hook for grouping OR we need some way of passing upper relation
specific information down to an FDW. I think some FDWs and extensions
will be happy if we provide them readymade decisions for can_sort,
can_hash, can_partial_agg etc. It will be good if they don't have to
translate the grouping target and havingQual for every child twice,
once for core and second time in the FDW. In all it looks like we need
some structure to hold that information so that we can pass it down
the hook. I am fine with two structures one variable and other
invariable. An upper operation can have one of them or both.

I'm fine with using a structure to bundle details that need to be sent
to the FDW, but why should the FDW need to know about
can_sort/can_hash? I suppose if it needs to know about
can_partial_agg then it doesn't really cost anything to pass through
all the flags, but I doubt whether the FDW has any use for the others.
Anyway, if that's the goal, let's just make sure that the set of
things we're passing that way are exactly the set of things that we
think the FDW might need.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#130

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#129)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Mar 19, 2018 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

This patch also renames can_parallel_agg to
can_partial_agg and removes the parallelism-specific bits from it.

I think we need to update the comments in this function to use phrase
"partial aggregation" instead of "parallel aggregation". And I think
we need to change the conditions as well. For example if
parse->groupClause == NIL, why can't we do partial aggregation? This
is the classical case when we will need patial aggregation. Probably
we should test this with Jeevan's patches for partition-wise aggregate
to see if it considers partition-wise aggregate or not.

I think the case where we have neither any aggregates nor a grouping
clause is where we are doing SELECT DISTINCT.

That's a case which will also benefit from partial partition-wise
grouping where each partition produces its own distinct values, thus
reducing the number of rows that flow through an Append node or better
over network, and finalization step removes duplicates. In fact, what
we are attempting here is partition-wise grouping and partial
aggregation happens to be a requirement for doing that if there are
aggregates. If there are no aggregates, we still benefit from
partition-wise grouping. So, can_partial_agg should only tell whether
we can calculate partial aggregates. If there are aggregates present,
and can_partial_agg is false, we can not attempt partition-wise
grouping. But if there are no aggregates and can_partial_agg is false,
it shouldn't prohibit us from using partition-wise aggregates.

Something like SELECT
COUNT(*) FROM ... is not this case because that has an aggregate.

I'm sort of on the fence as to whether and how to update the comments.
I agree that it's a little strange to leave the comments here
referencing parallel aggregation when the function has been renamed to
is_partial_agg(), but a simple search-and-replace doesn't necessarily
improve the situation very much.

Hmm, agree. And as you have mentioned downthread, it's not part of the
this patchset to attempt to do that. May be I will try providing a
patch to update comments once we have committed PWA.

Most notably, hasNonSerial is
irrelevant for partial but non-parallel aggregation, but we still have
the test because we haven't done the work to really do the right thing
here, which is to separately track whether we can do parallel partial
aggregation (either hasNonPartial or hasNonSerial is a problem) and
non-parallel partial aggregation (only hasNonPartial is a problem).
This needs a deeper reassessment, but I don't think that can or should
be something we try to do right now.

I think this bit is easy to mention in the comments. We could always
say that since partial aggregation is using serialization and
deserialization while calculating partial aggregates, even though the
step is not needed, we need hasNonSerial to be true for partial
aggregation to work. A future improvement to avoid serialization and
deserialization in partial aggregation when no parallel query is
involved should remove this condition from here and treat it as a
requirement for parallel aggregation.

OR When parse->groupingSets is true, I can see why we can't use
parallel query, but we can still compute partial aggregates. This
condition doesn't hurt since partition-wise aggregation bails out when
there are grouping sets, so it's not that harmful here.

I haven't thought deeply about what will break when GROUPING SETS are
in use, but it's not the purpose of this patch set to make them work
where they didn't before. The point of hoisting the first two tests
out of this function was just to avoid doing repeated work when
partition-wise aggregate is in use. Those two tests could conceivably
produce different results for different children, whereas the
remaining tests won't give different answers. Let's not get
distracted by the prospect of improving the tests. I suspect that's
not anyway so simple to achieve as what you seem to be speculating
here...

+1.

I'm fine with using a structure to bundle details that need to be sent
to the FDW, but why should the FDW need to know about
can_sort/can_hash? I suppose if it needs to know about
can_partial_agg then it doesn't really cost anything to pass through
all the flags, but I doubt whether the FDW has any use for the others.
Anyway, if that's the goal, let's just make sure that the set of
things we're passing that way are exactly the set of things that we
think the FDW might need.

I am speculating that an FDW or custom planner hook, which does some
local processing or hints something to the foreign server can use
those flags. But I agree, that unless we see such a requirement we
shouldn't expose those in the structure. It will get difficult to
remove those later.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#131

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#128)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi,

On Mon, Mar 19, 2018 at 10:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Ok. That looks good.

Here's an updated version. In this version, based on a voice
discussion with Ashutosh and Jeevan, I adjusted 0001 to combine it
with an earlier idea of splitting Gather/Gather Merge path generation
out of the function that creates partially aggregated paths. The idea
here is that create_ordinary_gather_paths() could first call
create_partial_grouping_paths(), then add additional paths which might
be partial or non-partial by invoking the partition-wise aggregate
logic, then call gather_grouping_paths() and set_cheapest() to
finalize the partially grouped rel. Also, I added draft commit
messages.

I have added all these three patches in the attached patch-set and rebased
my changes over it.

However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
changes you have proposed on another mail-thread and thus it has 0001 patch
refactoring the scanjoin issue.
0002, 0003 and 0004 are your patches added in this patchset.
0005 and 0006 are further refactoring patches. 0006 adds a
GroupPathExtraData which stores mostly child variant data and costs.
0007 is main partitionwise aggregation patch which is then rebased
accordingly.
0008 contains testcase and 0009 contains FDW changes.

Let me know if I missed any point to consider while rebasing.

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#132

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#131)

3 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 20, 2018 at 10:46 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I have added all these three patches in the attached patch-set and rebased
my changes over it.

However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
changes you have proposed on another mail-thread and thus it has 0001 patch
refactoring the scanjoin issue.
0002, 0003 and 0004 are your patches added in this patchset.
0005 and 0006 are further refactoring patches. 0006 adds a
GroupPathExtraData which stores mostly child variant data and costs.
0007 is main partitionwise aggregation patch which is then rebased
accordingly.
0008 contains testcase and 0009 contains FDW changes.

Committed my refactoring patches (your 0002-0004).

Regarding apply_scanjoin_target_to_paths in 0001 and 0007, it seems
like what happens is: we first build an Append path for the topmost
scan/join rel. That uses paths from the individual relations that
don't necessarily produce the final scan/join target. Then we mutate
those relations in place during partition-wise aggregate so that they
now do produce the final scan/join target and generate some more paths
using the results. So there's an ordering dependency, and the same
pathlist represents different things at different times. That is, I
suppose, not technically any worse than what we're doing for the
scan/join rel's pathlist in general, but here there's the additional
complexity that the paths get used both before and after being
mutated. The UPPERREL_TLIST proposal would clean this up, although I
realize that has unresolved issues.

In create_partial_grouping_paths, the loop that does "for (i = 0; i <
2; i++)" is not exactly what I had in mind when I said that we should
use two loops. I did not mean a loop with two iterations. I meant
adding a loop like foreach(lc, input_rel->pathlist) in each place
where we currently have a loop like
foreach(input_rel->partial_pathlist). See 0001, attached.

Don't write if (a) Assert(b) but rather Assert(!a || b). See 0002, attached.

In the patch as proposed, create_partial_grouping_paths() can get
called even if GROUPING_CAN_PARTIAL_AGG is not set. I think that's
wrong. If can_partial_agg() isn't accurately determining whether
partial aggregation is possible, and as Ashutosh and I have been
discussing, there's room for improvement in that area, then that's a
topic for some other set of patches. Also, the test in
create_ordinary_grouping_paths for whether or not to call
create_partial_grouping_paths() is super-complicated and uncommented.
I think a simpler approach is to allow create_partial_grouping_paths()
the option of returning NULL. See 0003, attached.

make_grouping_rel() claims that "For now, all aggregated paths are
added to the (GROUP_AGG, NULL) upperrel", but this is false: we no
longer have only one grouped upper rel.

I'm having a heck of a time understanding what is_partial_aggregation
and perform_partial_partitionwise_aggregation are supposed to be
doing. It seems like is_partial_aggregation means that we should ONLY
do partial aggregation, which is not exactly what the name implies.
It also seems like perform_partial_partitionwise_aggregation and
is_partial_aggregation differ only in that they apply to the current
level and to the child level respectively; can't we merge these
somehow so that we don't need both of them?

I think that if the last test in can_partitionwise_grouping were moved
before the previous test, it could be simplified to test only
(extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
*perform_partial_partitionwise_aggregation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0003-Allow-create_partial_grouping_paths-to-return-NULL.patchapplication/octet-stream; name=0003-Allow-create_partial_grouping_paths-to-return-NULL.patchDownload

From 584d521a38dacdf1364928d91b479c761d4d1b4e Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 20 Mar 2018 15:02:23 -0400
Subject: [PATCH 3/3] Allow create_partial_grouping_paths to return NULL.

---
 src/backend/optimizer/plan/planner.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fda32b310c..b2627f5288 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3928,11 +3928,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	 * partially grouped paths; that way, later code can easily consider both
 	 * parallel and non-parallel approaches to grouping.
 	 */
-	if ((grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL
-		 && (extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0) ||
-		((extra->flags & GROUPING_CAN_PARTITIONWISE_AGG) != 0
-		 && extra->perform_partial_partitionwise_aggregation) ||
-		extra->is_partial_aggregation)
+	if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0)
 		partially_grouped_rel =
 			create_partial_grouping_paths(root,
 										  grouped_rel,
@@ -6366,6 +6362,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
  * All paths for this new upper relation -- both partial and non-partial --
  * have been partially aggregated but require a subsequent FinalizeAggregate
  * step.
+ *
+ * NB: This function is allowed to return NULL if it determines that there is
+ * no real need to create a new RelOptInfo.
  */
 static RelOptInfo *
 create_partial_grouping_paths(PlannerInfo *root,
@@ -6405,6 +6404,19 @@ create_partial_grouping_paths(PlannerInfo *root,
 	if (grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL)
 		cheapest_partial_path = linitial(input_rel->partial_pathlist);
 
+	/*
+	 * If we can't partially aggregate partial paths, and we can't partially
+	 * aggregate non-partial paths, then there may not be any point to creating
+	 * a new RelOptInfo after all.  However, if partition-wise aggregate is
+	 * a possibility, then we need to create the RelOptInfo anyway, becuase the
+	 * caller may want to add Append paths to it.
+	 */
+	if (cheapest_total_path == NULL &&
+		cheapest_partial_path == NULL &&
+		((extra->flags & GROUPING_CAN_PARTITIONWISE_AGG) == 0 ||
+		!IS_OTHER_REL(input_rel)))
+		return NULL;
+
 	/*
 	 * Build a new upper relation to represent the result of partially
 	 * aggregating the rows from the input relation.
-- 
2.14.3 (Apple Git-98)

0002-Fix-Assert-style.patchapplication/octet-stream; name=0002-Fix-Assert-style.patchDownload

From e3ccf21f5dc32f876ea8f981febc5fcad4c340f8 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 20 Mar 2018 14:17:03 -0400
Subject: [PATCH 2/3] Fix Assert style.

---
 src/backend/optimizer/plan/planner.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c05182d479..fda32b310c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6933,8 +6933,8 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	 * If partial partitionwise aggregation needs to be performed, then we
 	 * must have created a partially_grouped_rel already.
 	 */
-	if (extra->perform_partial_partitionwise_aggregation)
-		Assert(partially_grouped_rel != NULL);
+	Assert(!extra->perform_partial_partitionwise_aggregation ||
+		   partially_grouped_rel != NULL);
 
 	/*
 	 * For full aggregation or grouping, each partition produces a disjoint
-- 
2.14.3 (Apple Git-98)

0001-Remove-loop-from-0-to-1.patchapplication/octet-stream; name=0001-Remove-loop-from-0-to-1.patchDownload

From 54a8a61abb3c629196e6e9a778f8527103df26c2 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 20 Mar 2018 14:12:58 -0400
Subject: [PATCH 1/3] Remove loop from 0 to 1.

---
 src/backend/optimizer/plan/planner.c | 289 +++++++++++++++++++++--------------
 1 file changed, 171 insertions(+), 118 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4c11d84fd6..c05182d479 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6378,14 +6378,32 @@ create_partial_grouping_paths(PlannerInfo *root,
 	RelOptInfo *partially_grouped_rel;
 	AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
-	Path	   *cheapest_path;
-	Size		hashaggtablesize;
+	Path	   *cheapest_partial_path = NULL;
+	Path	   *cheapest_total_path = NULL;
 	double		dNumPartialGroups = 0;
+	double		dNumPartialPartialGroups = 0;
 	ListCell   *lc;
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
-	bool		use_partial_pathlist;
-	int			i;
+
+	/*
+	 * Consider whether we should generate partially aggregated non-partial
+	 * paths.  We can only do this if we have a non-partial path, and in
+	 * addition the caller must have requested it by setting
+	 * extra->is_partial_aggregation or
+	 * extra->perform_partial_partitionwise_aggregation.
+	 */
+	if (input_rel->pathlist != NIL && (extra->is_partial_aggregation ||
+		extra->perform_partial_partitionwise_aggregation))
+		cheapest_total_path = input_rel->cheapest_total_path;
+
+	/*
+	 * If parallelism is possible for grouped_rel, then we should consider
+	 * generating partially-grouped partial paths.  However, if the input rel
+	 * has no partial paths, then we can't.
+	 */
+	if (grouped_rel->consider_parallel && input_rel->partial_pathlist != NIL)
+		cheapest_partial_path = linitial(input_rel->partial_pathlist);
 
 	/*
 	 * Build a new upper relation to represent the result of partially
@@ -6443,137 +6461,172 @@ create_partial_grouping_paths(PlannerInfo *root,
 		extra->partial_costs_set = true;
 	}
 
-	/*
-	 * We loop twice, one to generate paths for partial_pathlist when parallel
-	 * paths are possible and second time for generating paths in pathlist when
-	 * we need partially aggregated results for the partitionwise grouping
-	 * and/or aggregation.
-	 */
-	use_partial_pathlist = true;
-	for (i = 0; i < 2; i++)
+	/* Estimate number of partial groups. */
+	if (cheapest_total_path != NULL)
+		dNumPartialGroups =
+			get_number_of_groups(root,
+								 cheapest_total_path->rows,
+								 gd,
+								 extra->targetList);
+	if (cheapest_partial_path != NULL)
+		dNumPartialPartialGroups =
+			get_number_of_groups(root,
+								 cheapest_partial_path->rows,
+								 gd,
+								 extra->targetList);
+
+	if (can_sort && cheapest_total_path != NULL)
 	{
-		if (use_partial_pathlist && !(grouped_rel->consider_parallel &&
-									  input_rel->partial_pathlist != NIL &&
-									  (extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0))
-		{
-			/* No parallel paths possible. */
-			use_partial_pathlist = false;
-			continue;
-		}
+		/* This should have been checked previously */
+		Assert(parse->hasAggs || parse->groupClause);
 
-		if (!use_partial_pathlist && !extra->is_partial_aggregation)
+		/*
+		 * Use any available suitably-sorted path as input, and also consider
+		 * sorting the cheapest partial path.
+		 */
+		foreach(lc, input_rel->pathlist)
 		{
-			/* No partial partitiowise aggregation possible. */
-			continue;
-		}
+			Path	   *path = (Path *) lfirst(lc);
+			bool		is_sorted;
 
-		/* Get either total or partial cheapest path */
-		cheapest_path = use_partial_pathlist ? linitial(input_rel->partial_pathlist) :
-			input_rel->cheapest_total_path;
+			is_sorted = pathkeys_contained_in(root->group_pathkeys,
+											  path->pathkeys);
+			if (path == cheapest_total_path || is_sorted)
+			{
+				/* Sort the cheapest partial path, if it isn't already */
+				if (!is_sorted)
+					path = (Path *) create_sort_path(root,
+													 partially_grouped_rel,
+													 path,
+													 root->group_pathkeys,
+													 -1.0);
 
-		/* Estimate number of partial groups. */
-		dNumPartialGroups = get_number_of_groups(root,
-												 cheapest_path->rows,
-												 gd,
-												 extra->targetList);
+				if (parse->hasAggs)
+					add_path(partially_grouped_rel, (Path *)
+							 create_agg_path(root,
+											 partially_grouped_rel,
+											 path,
+											 partially_grouped_rel->reltarget,
+											 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+											 AGGSPLIT_INITIAL_SERIAL,
+											 parse->groupClause,
+											 NIL,
+											 agg_partial_costs,
+											 dNumPartialGroups));
+				else
+					add_path(partially_grouped_rel, (Path *)
+							 create_group_path(root,
+											   partially_grouped_rel,
+											   path,
+											   parse->groupClause,
+											   NIL,
+											   dNumPartialGroups));
+			}
+		}
+	}
 
-		if (can_sort)
+	if (can_sort && cheapest_partial_path != NULL)
+	{
+		/* Similar to above logic, but for partial paths. */
+		foreach(lc, input_rel->partial_pathlist)
 		{
-			List	   *pathlist;
-
-			/* This should have been checked previously */
-			Assert(parse->hasAggs || parse->groupClause);
-
-			/* Get appropriate pathlist */
-			pathlist = use_partial_pathlist ? input_rel->partial_pathlist :
-				input_rel->pathlist;
+			Path	   *path = (Path *) lfirst(lc);
+			bool		is_sorted;
 
-			/*
-			 * Use any available suitably-sorted path as input, and also
-			 * consider sorting the cheapest path.
-			 */
-			foreach(lc, pathlist)
+			is_sorted = pathkeys_contained_in(root->group_pathkeys,
+											  path->pathkeys);
+			if (path == cheapest_partial_path || is_sorted)
 			{
-				Path	   *path = (Path *) lfirst(lc);
-				bool		is_sorted;
-				Path	   *gpath;
+				/* Sort the cheapest partial path, if it isn't already */
+				if (!is_sorted)
+					path = (Path *) create_sort_path(root,
+													 partially_grouped_rel,
+													 path,
+													 root->group_pathkeys,
+													 -1.0);
 
-				is_sorted = pathkeys_contained_in(root->group_pathkeys,
-												  path->pathkeys);
-				if (path == cheapest_path || is_sorted)
-				{
-					/* Sort the cheapest path, if it isn't already */
-					if (!is_sorted)
-						path = (Path *) create_sort_path(root,
-														 partially_grouped_rel,
-														 path,
-														 root->group_pathkeys,
-														 -1.0);
-
-					if (parse->hasAggs)
-						gpath = (Path *) create_agg_path(root,
-														 partially_grouped_rel,
-														 path,
-														 partially_grouped_rel->reltarget,
-														 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
-														 AGGSPLIT_INITIAL_SERIAL,
-														 parse->groupClause,
-														 NIL,
-														 agg_partial_costs,
-														 dNumPartialGroups);
-					else
-						gpath = (Path *) create_group_path(root,
-														   partially_grouped_rel,
-														   path,
-														   parse->groupClause,
-														   NIL,
-														   dNumPartialGroups);
-
-					if (use_partial_pathlist)
-						add_partial_path(partially_grouped_rel, gpath);
-					else
-						add_path(partially_grouped_rel, gpath);
-				}
+				if (parse->hasAggs)
+					add_partial_path(partially_grouped_rel, (Path *)
+									 create_agg_path(root,
+													 partially_grouped_rel,
+													 path,
+													 partially_grouped_rel->reltarget,
+													 parse->groupClause ? AGG_SORTED : AGG_PLAIN,
+													 AGGSPLIT_INITIAL_SERIAL,
+													 parse->groupClause,
+													 NIL,
+													 agg_partial_costs,
+													 dNumPartialPartialGroups));
+				else
+					add_partial_path(partially_grouped_rel, (Path *)
+									 create_group_path(root,
+													   partially_grouped_rel,
+													   path,
+													   parse->groupClause,
+													   NIL,
+													   dNumPartialPartialGroups));
 			}
 		}
+	}
 
-		if (can_hash)
-		{
-			/* Checked above */
-			Assert(parse->hasAggs || parse->groupClause);
+	if (can_hash && cheapest_total_path != NULL)
+	{
+		Size		hashaggtablesize;
 
-			hashaggtablesize =
-				estimate_hashagg_tablesize(cheapest_path,
-										   agg_partial_costs,
-										   dNumPartialGroups);
+		/* Checked above */
+		Assert(parse->hasAggs || parse->groupClause);
 
-			/*
-			 * Tentatively produce a partial HashAgg Path, depending on if it
-			 * looks as if the hash table will fit in work_mem.
-			 */
-			if (hashaggtablesize < work_mem * 1024L)
-			{
-				Path	   *gpath;
-
-				gpath = (Path *) create_agg_path(root,
-												 partially_grouped_rel,
-												 cheapest_path,
-												 partially_grouped_rel->reltarget,
-												 AGG_HASHED,
-												 AGGSPLIT_INITIAL_SERIAL,
-												 parse->groupClause,
-												 NIL,
-												 agg_partial_costs,
-												 dNumPartialGroups);
-
-				if (use_partial_pathlist)
-					add_partial_path(partially_grouped_rel, gpath);
-				else
-					add_path(partially_grouped_rel, gpath);
-			}
+		hashaggtablesize =
+			estimate_hashagg_tablesize(cheapest_total_path,
+									   agg_partial_costs,
+									   dNumPartialGroups);
+
+		/*
+		 * Tentatively produce a partial HashAgg Path, depending on if it
+		 * looks as if the hash table will fit in work_mem.
+		 */
+		if (hashaggtablesize < work_mem * 1024L &&
+			cheapest_total_path != NULL)
+		{
+			add_path(partially_grouped_rel, (Path *)
+					 create_agg_path(root,
+									 partially_grouped_rel,
+									 cheapest_total_path,
+									 partially_grouped_rel->reltarget,
+									 AGG_HASHED,
+									 AGGSPLIT_INITIAL_SERIAL,
+									 parse->groupClause,
+									 NIL,
+									 agg_partial_costs,
+									 dNumPartialGroups));
 		}
+	}
+
+	if (can_hash && cheapest_partial_path != NULL)
+	{
+		Size		hashaggtablesize;
+
+		hashaggtablesize =
+			estimate_hashagg_tablesize(cheapest_partial_path,
+									   agg_partial_costs,
+									   dNumPartialPartialGroups);
 
-		use_partial_pathlist = false;
+		/* Do the same for partial paths. */
+		if (hashaggtablesize < work_mem * 1024L &&
+			cheapest_partial_path != NULL)
+		{
+			add_partial_path(partially_grouped_rel, (Path *)
+							 create_agg_path(root,
+											 partially_grouped_rel,
+											 cheapest_partial_path,
+											 partially_grouped_rel->reltarget,
+											 AGG_HASHED,
+											 AGGSPLIT_INITIAL_SERIAL,
+											 parse->groupClause,
+											 NIL,
+											 agg_partial_costs,
+											 dNumPartialPartialGroups));
+		}
 	}
 
 	/*
-- 
2.14.3 (Apple Git-98)

#133

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#132)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 21, 2018 at 2:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 20, 2018 at 10:46 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I have added all these three patches in the attached patch-set and

rebased

my changes over it.

However, I have not yet made this patch-set dependednt on UPPERREL_TLIST
changes you have proposed on another mail-thread and thus it has 0001

patch

refactoring the scanjoin issue.
0002, 0003 and 0004 are your patches added in this patchset.
0005 and 0006 are further refactoring patches. 0006 adds a
GroupPathExtraData which stores mostly child variant data and costs.
0007 is main partitionwise aggregation patch which is then rebased
accordingly.
0008 contains testcase and 0009 contains FDW changes.

Committed my refactoring patches (your 0002-0004).

Thanks Robert.

Regarding apply_scanjoin_target_to_paths in 0001 and 0007, it seems
like what happens is: we first build an Append path for the topmost
scan/join rel. That uses paths from the individual relations that
don't necessarily produce the final scan/join target. Then we mutate
those relations in place during partition-wise aggregate so that they
now do produce the final scan/join target and generate some more paths
using the results. So there's an ordering dependency, and the same
pathlist represents different things at different times. That is, I
suppose, not technically any worse than what we're doing for the
scan/join rel's pathlist in general, but here there's the additional
complexity that the paths get used both before and after being
mutated. The UPPERREL_TLIST proposal would clean this up, although I
realize that has unresolved issues.

In create_partial_grouping_paths, the loop that does "for (i = 0; i <
2; i++)" is not exactly what I had in mind when I said that we should
use two loops. I did not mean a loop with two iterations. I meant
adding a loop like foreach(lc, input_rel->pathlist) in each place
where we currently have a loop like
foreach(input_rel->partial_pathlist).

The path creation logic for partial_pathlist and pathlist was identical and
thus I thought of just loop over it twice switching the pathlist, so that
we have minimal code to maintain. But yes I agree that it adds additional
complexity.

See 0001, attached.

Looks great. Thanks.

Don't write if (a) Assert(b) but rather Assert(!a || b). See 0002,
attached.

OK. Noted.

In the patch as proposed, create_partial_grouping_paths() can get
called even if GROUPING_CAN_PARTIAL_AGG is not set. I think that's
wrong.

I don't think so. For parallel case, we do check that. And for
partitionwise aggregation check, it was checked inside
can_partitionwise_grouping() function and flags were set accordingly. Am I
missing something?

If can_partial_agg() isn't accurately determining whether
partial aggregation is possible,

I think it does accurately determine.
if (!parse->hasAggs && parse->groupClause == NIL)
is only valid for DISTINCT queries which we are anyway not handling here
and for partitionwise aggregate it won't be true otherwise it will be a
degenerate grouping case.

and as Ashutosh and I have been
discussing, there's room for improvement in that area, then that's a
topic for some other set of patches. Also, the test in
create_ordinary_grouping_paths for whether or not to call
create_partial_grouping_paths() is super-complicated and uncommented.
I think a simpler approach is to allow create_partial_grouping_paths()
the option of returning NULL. See 0003, attached.

Thanks for simplifying it.

However, after this simplification, we were unnecessary creating
non-parallel partial aggregation paths for the root input rel when not
needed.
Consider a case where we need a partial aggregation from a child, in this
case, extra->is_partial_aggregation = 0 at root level entry as the parent
is still doing full aggregation but
perform_partial_partitionwise_aggregation is true, which tells a child to
perform partial partitionwise aggregation. In this case,
cheapest_total_path will be set and thus we will go ahead and create
partial aggregate paths for the parent rel, which is not needed.

I have tweaked these conditions and posted in a separate patch (0006).
However, I have merged all your three patches in one (0005).

make_grouping_rel() claims that "For now, all aggregated paths are
added to the (GROUP_AGG, NULL) upperrel", but this is false: we no
longer have only one grouped upper rel.

Done.

I'm having a heck of a time understanding what is_partial_aggregation
and perform_partial_partitionwise_aggregation are supposed to be
doing.

As you said correctly, is_partial_aggregation denotes that we are doing
ONLY a partial aggregation at this level of partitioning hierarchy whereas
perform_partial_partitionwise_aggregation is used to instruct the child
whether it should perform partial or full aggregation at its level. Since
we need to create a partially_grouped_rel we evaluate all these
possibilities and thus need to pass those to the child so that child will
not need to compute it again.

It seems like is_partial_aggregation means that we should ONLY

do partial aggregation, which is not exactly what the name implies.

I think it says we are doing a partial aggregation and thus implies to use
partially_grouped_rel and skip finalization step.

It also seems like perform_partial_partitionwise_aggregation and

is_partial_aggregation differ only in that they apply to the current
level and to the child level respectively;

It's the other way. is_partial_aggregation is applied to the current level
and perform_partial_partitionwise_aggregation applies to child level.

can't we merge these

somehow so that we don't need both of them?

I think we can't as they apply at two different levels. A scenario in which
we are doing full aggregation at level 1 and need to perform partial
aggregation at level 2, they are different. But yes, they both will be same
if both the levels are doing same. But can't merge those.

Do you think any better names as it seems confusing?

I think that if the last test in can_partitionwise_grouping were moved
before the previous test, it could be simplified to test only
(extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
*perform_partial_partitionwise_aggregation.

I think we can't do this way. If *perform_partial_partitionwise_aggregation
found to be true then only we need to check whether partial aggregation
itself is possible or not. If we are going to perform a full partitionwise
aggregation then test for can_partial_agg is not needed. Have I misread
your comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#134

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#133)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 21, 2018 at 8:01 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

In the patch as proposed, create_partial_grouping_paths() can get
called even if GROUPING_CAN_PARTIAL_AGG is not set. I think that's
wrong.

I don't think so. For parallel case, we do check that. And for partitionwise
aggregation check, it was checked inside can_partitionwise_grouping()
function and flags were set accordingly. Am I missing something?

Well, one of us is missing something somewhere. If
GROUPING_CAN_PARTIAL_AGG means that we're allowed to do partial
grouping, and if create_partial_grouping_paths() is where partial
grouping happens, then we should only be calling the latter if the
former is set. I mean, how can it make sense to create
partially-grouped paths if we're not allowed to do partial grouping?

I have tweaked these conditions and posted in a separate patch (0006).
However, I have merged all your three patches in one (0005).

OK, thanks. I wasn't sure I had understood what was going on, so
thanks for checking it.

Thanks also for keeping 0004-0006 separate here, but I think you can
flatten them into one patch in the next version.

I think that if the last test in can_partitionwise_grouping were moved
before the previous test, it could be simplified to test only
(extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
*perform_partial_partitionwise_aggregation.

I think we can't do this way. If *perform_partial_partitionwise_aggregation
found to be true then only we need to check whether partial aggregation
itself is possible or not. If we are going to perform a full partitionwise
aggregation then test for can_partial_agg is not needed. Have I misread your
comments?

It seems you're correct, because when I change it the tests fail. I
don't yet understand why.

Basically, the main patch seems to use three Boolean signaling mechanisms:

1. GROUPING_CAN_PARTITIONWISE_AGG
2. is_partial_aggregation
3. perform_partial_partitionwise_aggregation

Stuff I don't understand:

- Why is one of them a Boolean shoved into "flags", even though it's
not static across the whole hierarchy like the other flags, and the
other two are separate Booleans?
- What do they all do, anyway?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#135

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#134)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 21, 2018 at 7:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 21, 2018 at 8:01 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

In the patch as proposed, create_partial_grouping_paths() can get
called even if GROUPING_CAN_PARTIAL_AGG is not set. I think that's
wrong.

I don't think so. For parallel case, we do check that. And for

partitionwise

aggregation check, it was checked inside can_partitionwise_grouping()
function and flags were set accordingly. Am I missing something?

Well, one of us is missing something somewhere. If
GROUPING_CAN_PARTIAL_AGG means that we're allowed to do partial
grouping, and if create_partial_grouping_paths() is where partial
grouping happens, then we should only be calling the latter if the
former is set. I mean, how can it make sense to create
partially-grouped paths if we're not allowed to do partial grouping?

Yes, that's true. If we are not allowed to do partial grouping,
partially_grouped paths should not be created.

However, what I mean is that the partitionwise related checks added, if
evaluates to true it implies that GROUPING_CAN_PARTIAL_AGG is also set as
it was checked earlier. And thus does not need explicit check again.

Anyway, after your refactoring, it becomes more readable now.

I have tweaked these conditions and posted in a separate patch (0006).
However, I have merged all your three patches in one (0005).

OK, thanks. I wasn't sure I had understood what was going on, so
thanks for checking it.

Thanks also for keeping 0004-0006 separate here, but I think you can
flatten them into one patch in the next version.

OK. Sure.

I think that if the last test in can_partitionwise_grouping were moved
before the previous test, it could be simplified to test only
(extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0 and not
*perform_partial_partitionwise_aggregation.

I think we can't do this way. If *perform_partial_

partitionwise_aggregation

found to be true then only we need to check whether partial aggregation
itself is possible or not. If we are going to perform a full

partitionwise

aggregation then test for can_partial_agg is not needed. Have I misread

your

comments?

It seems you're correct, because when I change it the tests fail. I
don't yet understand why.

Basically, the main patch seems to use three Boolean signaling mechanisms:

1. GROUPING_CAN_PARTITIONWISE_AGG
2. is_partial_aggregation
3. perform_partial_partitionwise_aggregation

Stuff I don't understand:

- Why is one of them a Boolean shoved into "flags", even though it's
not static across the whole hierarchy like the other flags, and the
other two are separate Booleans?
- What do they all do, anyway?

Let me try to explain this:

1. GROUPING_CAN_PARTITIONWISE_AGG
Tells us whether or not partitionwise grouping and/or aggregation is ever
possible. If it is FALSE, other two have no meaning and they will be
useless. However, if it is TRUE, then only we attempt to create paths
partitionwise.
I have kept it in "flags" as it looks similar in behavior with other flag
members like can_sort, can_hash etc. And, for given grouped relation
whether parent or child, they all work similarly. But yes, for child
relation, we inherit can_sort/can_hash from the parent as they won't
change. But need to evaluate this for every child.
If required, I can move that to a GroupPathExtraData struct.

2. extra->is_partial_aggregation
This boolean var is used to identify at any given time whether we are
computing a full aggregation or a partial aggregation. This boolean is
necessary when doing partial aggregation to skip finalization. And also
tells us to use partially_grouped_rel when true.

3. extra->perform_partial_partitionwise_aggregation
This boolean var is used to instruct child that it has to create a
partially aggregated paths when TRUE. And then it transferred to
child_extra->is_partial_aggregation in
create_partitionwise_grouping_paths().

Basically (3) is required as we wanted to create a partially_grouped_rel
upfront. So that if the child is going to create a partially aggregated
paths, they can append those into the parent's partially grouped rel and
thus we need to create that before even we enter into the child paths
creation.
Since (3) is only valid if (1) is true, we need to compute (1) upfront too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#136

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#135)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 21, 2018 at 11:33 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Let me try to explain this:
1. GROUPING_CAN_PARTITIONWISE_AGG
2. extra->is_partial_aggregation
3. extra->perform_partial_partitionwise_aggregation

Please find attached an incremental patch that attempts to refactor
this logic into a simpler form. What I've done is merged all three of
the above Booleans into a single state variable called 'patype', which
can be one of PARTITIONWISE_AGGREGATE_NONE,
PARTITIONWISE_AGGREGATE_FULL, and PARTITIONWISE_AGGREGATE_PARTIAL.
When create_ordinary_grouping_paths() is called, extra.patype is the
value for the parent relation; that function computes a new value and
passes it down to create_partitionwise_grouping_paths(), which inserts
into the new 'extra' structure for the child.

Essentially, in your system, extra->is_partial_aggregation and
extra->perform_partial_partitionwise_aggregation both corresponded to
whether or not patype == PARTITIONWISE_AGGREGATE_PARTIAL, but the
former indicated whether the *parent* was doing partition-wise
aggregation (and thus we needed to generate only partial paths) while
the latter indicated whether the *current* relation was doing
partition-wise aggregation (and thus we needed to force creation of
partially_grouped_rel). This took me a long time to understand
because of the way the fields were named; they didn't indicate that
one was for the parent and one for the current relation. Meanwhile,
GROUPING_CAN_PARTITIONWISE_AGG indicated whether partition-wise
aggregate should be tried at all for the current relation; there was
no analogous indicator for the parent relation because we can't be
processing a child at all if the parent didn't decide to do
partition-wise aggregation. So to know what was happening for the
current relation you had to look at GROUPING_CAN_PARTITIONWISE_AGG +
extra->perform_partial_partitionwise_aggregation, and to know what was
happening for the parent relation you just looked at
extra->is_partial_aggregation. With this proposed refactoring patch,
there's just one patype value at each level, which at least to me
seems simpler. I tried to improve the comments somewhat, too.

You have some long lines that it would be good to break, like this:

child_extra.targetList = (List *) adjust_appendrel_attrs(root,

(Node *) extra->targetList,
nappinfos,
appinfos);

If you put a newline after (List *), the formatting will come out
nicer -- it will fit within 80 columns. Please go through the patches
and make these kinds of changes for lines over 80 columns where
possible.

I guess we'd better move the GROUPING_CAN_* constants to a header
file, if they're going to be exposed through GroupPathExtraData. That
can go in some refactoring patch.

Is there a good reason not to use input_rel->relids as the input to
fetch_upper_rel() in all cases, rather than just at subordinate
levels?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Refactor-partitonwise-aggregate-signalling.patchapplication/octet-stream; name=0001-Refactor-partitonwise-aggregate-signalling.patchDownload

From 71bb618f69872623366ca85ff4799c99c7ca9e1f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 21 Mar 2018 17:34:17 -0400
Subject: [PATCH] Refactor partitonwise aggregate signalling.

---
 src/backend/optimizer/plan/planner.c | 215 ++++++++++++++---------------------
 src/include/nodes/relation.h         |  26 ++++-
 2 files changed, 105 insertions(+), 136 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f2b1a8bf39..e3d4dcaae7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -107,14 +107,10 @@ typedef struct
  * GROUPING_CAN_PARTIAL_AGG should be set if the aggregation is of a type
  * for which we support partial aggregation (not, for example, grouping sets).
  * It says nothing about parallel-safety or the availability of suitable paths.
- *
- * GROUPING_CAN_PARTITIONWISE_AGG should be set if it's possible to perform
- * partitionwise grouping and/or aggregation.
  */
 #define GROUPING_CAN_USE_SORT       0x0001
 #define GROUPING_CAN_USE_HASH       0x0002
 #define GROUPING_CAN_PARTIAL_AGG	0x0004
-#define GROUPING_CAN_PARTITIONWISE_AGG	0x0008
 
 /*
  * Data specific to grouping sets
@@ -237,7 +233,8 @@ static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
 							  RelOptInfo *grouped_rel,
 							  RelOptInfo *input_rel,
 							  grouping_sets_data *gd,
-							  GroupPathExtraData *extra);
+							  GroupPathExtraData *extra,
+							  bool force_rel_creation);
 static void gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel);
 static bool can_partial_agg(PlannerInfo *root,
 				const AggClauseCosts *agg_costs);
@@ -246,18 +243,13 @@ static void apply_scanjoin_target_to_paths(PlannerInfo *root,
 							   PathTarget *scanjoin_target,
 							   bool scanjoin_target_parallel_safe,
 							   bool modify_in_place);
-static bool can_partitionwise_grouping(PlannerInfo *root,
-						   RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel,
-						   GroupPathExtraData *extra,
-						   grouping_sets_data *gd,
-						   bool *perform_partial_partitionwise_aggregation);
 static void create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *input_rel,
 									RelOptInfo *grouped_rel,
 									RelOptInfo *partially_grouped_rel,
 									const AggClauseCosts *agg_costs,
 									grouping_sets_data *gd,
+									PartitionwiseAggregateType patype,
 									GroupPathExtraData *extra);
 static bool group_by_has_partkey(RelOptInfo *input_rel,
 					 RelOptInfo *grouped_rel,
@@ -3742,16 +3734,17 @@ create_grouping_paths(PlannerInfo *root,
 		extra.havingQual = parse->havingQual;
 		extra.targetList = parse->targetList;
 		extra.partial_costs_set = false;
-		extra.is_partial_aggregation = false;
 
 		/*
-		 * Check whether we can perform partitionwise grouping and/or
-		 * aggregation.
+		 * Determine whether partitionwise aggregation is in theory possible.
+		 * It can be disabled by the user, and for now, we don't try to
+		 * support grouping sets.  create_ordinary_grouping_paths() will check
+		 * additional conditions, such as whether input_rel is partitioned.
 		 */
-		if (can_partitionwise_grouping(root, input_rel, grouped_rel, &extra,
-									   gd,
-									   &extra.perform_partial_partitionwise_aggregation))
-			extra.flags |= GROUPING_CAN_PARTITIONWISE_AGG;
+		if (enable_partitionwise_aggregate && !parse->groupingSets)
+			extra.patype = PARTITIONWISE_AGGREGATE_FULL;
+		else
+			extra.patype = PARTITIONWISE_AGGREGATE_NONE;
 
 		create_ordinary_grouping_paths(root, input_rel, grouped_rel,
 									   agg_costs, gd, &extra,
@@ -3923,6 +3916,38 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	Path	   *cheapest_path = input_rel->cheapest_total_path;
 	RelOptInfo *partially_grouped_rel = NULL;
 	double		dNumGroups;
+	PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
+
+	/*
+	 * If this is the topmost grouping relation or if the parent relation is
+	 * doing some form of partitionwise aggregation, then we may be able to do
+	 * it at this level also.  However, if the input relation is not
+	 * partitioned, partition-wise aggregate is impossible, and if it is dummy,
+	 * partition-wise aggregate is pointless.
+	 */
+	if (extra->patype != PARTITIONWISE_AGGREGATE_NONE &&
+		input_rel->part_scheme && input_rel->part_rels &&
+		!IS_DUMMY_REL(input_rel))
+	{
+		/*
+		 * If this is the topmost relation or if the parent relation is doing
+		 * full partitionwise aggregation, then we can do full partitionwise
+		 * aggregation provided that the GROUP BY clause contains all of the
+		 * partitioning columns at this level.  Otherwise, we can do at most
+		 * partial partitionwise aggregation.  But if partial aggregation is
+		 * not supported in general then we can't use it for partitionwise
+		 * aggregation either.
+		 */
+		if (extra->patype == PARTITIONWISE_AGGREGATE_FULL &&
+			group_by_has_partkey(input_rel, grouped_rel,
+								 extra->targetList,
+								 root->parse->groupClause))
+			patype = PARTITIONWISE_AGGREGATE_FULL;
+		else if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0)
+			patype = PARTITIONWISE_AGGREGATE_PARTIAL;
+		else
+			patype = PARTITIONWISE_AGGREGATE_NONE;
+	}
 
 	/*
 	 * Before generating paths for grouped_rel, we first generate any possible
@@ -3930,12 +3955,24 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	 * parallel and non-parallel approaches to grouping.
 	 */
 	if ((extra->flags & GROUPING_CAN_PARTIAL_AGG) != 0)
+	{
+		bool	force_rel_creation;
+
+		/*
+		 * If we're doing partition-wise aggregation at this level, force
+		 * creation of a partially_grouped_rel so we can add partition-wise
+		 * paths to it.
+		 */
+		force_rel_creation = (patype == PARTITIONWISE_AGGREGATE_PARTIAL);
+
 		partially_grouped_rel =
 			create_partial_grouping_paths(root,
 										  grouped_rel,
 										  input_rel,
 										  gd,
-										  extra);
+										  extra,
+										  force_rel_creation);
+	}
 
 	/*
 	 * Set partially_grouped_rel_p so that the caller get the newly created
@@ -3944,13 +3981,13 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	*partially_grouped_rel_p = partially_grouped_rel;
 
 	/* Apply partitionwise aggregation technique, if possible. */
-	if ((extra->flags & GROUPING_CAN_PARTITIONWISE_AGG) != 0)
+	if (patype != PARTITIONWISE_AGGREGATE_NONE)
 		create_partitionwise_grouping_paths(root, input_rel, grouped_rel,
 											partially_grouped_rel, agg_costs,
-											gd, extra);
+											gd, patype, extra);
 
 	/* If we are doing partial aggregation only, return. */
-	if (extra->is_partial_aggregation)
+	if (extra->patype == PARTITIONWISE_AGGREGATE_PARTIAL)
 	{
 		Assert(partially_grouped_rel);
 
@@ -6393,7 +6430,8 @@ create_partial_grouping_paths(PlannerInfo *root,
 							  RelOptInfo *grouped_rel,
 							  RelOptInfo *input_rel,
 							  grouping_sets_data *gd,
-							  GroupPathExtraData *extra)
+							  GroupPathExtraData *extra,
+							  bool force_rel_creation)
 {
 	Query	   *parse = root->parse;
 	RelOptInfo *partially_grouped_rel;
@@ -6409,11 +6447,13 @@ create_partial_grouping_paths(PlannerInfo *root,
 
 	/*
 	 * Consider whether we should generate partially aggregated non-partial
-	 * paths.  We can only do this if we have a non-partial path, and in
-	 * addition the caller must have requested it by setting
-	 * extra->is_partial_aggregation.
+	 * paths.  We can only do this if we have a non-partial path, and only if
+	 * the parent of the input rel is performing partial partitionwise
+	 * aggregation.  (Note that extra->patype is the type of partitionwise
+	 * aggregation being used at the parent level, not this level.)
 	 */
-	if (input_rel->pathlist != NIL && extra->is_partial_aggregation)
+	if (input_rel->pathlist != NIL &&
+		extra->patype == PARTITIONWISE_AGGREGATE_PARTIAL)
 		cheapest_total_path = input_rel->cheapest_total_path;
 
 	/*
@@ -6426,16 +6466,12 @@ create_partial_grouping_paths(PlannerInfo *root,
 
 	/*
 	 * If we can't partially aggregate partial paths, and we can't partially
-	 * aggregate non-partial paths, then there may not be any point to
-	 * creating a new RelOptInfo after all.  However, if partitionwise
-	 * aggregate is a possibility and going to perform a partial aggregation,
-	 * then we need to create the RelOptInfo anyway, because the caller may
-	 * want to add Append paths to it.
+	 * aggregate non-partial paths, then don't bother creating the new
+	 * RelOptInfo at all, unless the caller specified force_rel_creation.
 	 */
 	if (cheapest_total_path == NULL &&
 		cheapest_partial_path == NULL &&
-		((extra->flags & GROUPING_CAN_PARTITIONWISE_AGG) == 0 ||
-		 !extra->perform_partial_partitionwise_aggregation))
+		!force_rel_creation)
 		return NULL;
 
 	/*
@@ -6855,77 +6891,6 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 	}
 }
 
-/*
- * can_partitionwise_grouping
- *
- * Can we use partitionwise grouping technique?
- */
-static bool
-can_partitionwise_grouping(PlannerInfo *root,
-						   RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel,
-						   GroupPathExtraData *extra,
-						   grouping_sets_data *gd,
-						   bool *perform_partial_partitionwise_aggregation)
-{
-	Query	   *parse = root->parse;
-
-	*perform_partial_partitionwise_aggregation = false;
-
-	/* No, if user disabled partitionwise aggregation. */
-	if (!enable_partitionwise_aggregate)
-		return false;
-
-	/*
-	 * Currently, grouping sets plan does not work with an inheritance subtree
-	 * (see notes in create_groupingsets_plan). Moreover, grouping sets
-	 * implies multiple group by clauses, each of which may not have all
-	 * partition keys. Those sets which have all partition keys will be
-	 * computed completely for each partition, but others will require partial
-	 * aggregation. We will need to apply partitionwise aggregation at each
-	 * derived group by clause and not as a whole-sale strategy.  Due to this
-	 * we won't be able to compute "whole" grouping sets here and thus bail
-	 * out.
-	 */
-	if (parse->groupingSets || gd)
-		return false;
-
-	/*
-	 * Nothing to do, if the input relation is not partitioned or it has no
-	 * partitioned relations.
-	 */
-	if (!input_rel->part_scheme || !input_rel->part_rels)
-		return false;
-
-	/* Nothing to do, if the input relation itself is dummy. */
-	if (IS_DUMMY_REL(input_rel))
-		return false;
-
-	/*
-	 * If partition keys are part of group by clauses, then we can do full
-	 * partitionwise aggregation.  Otherwise need to calculate partial
-	 * aggregates for each partition and combine them.
-	 *
-	 * However, if caller forces to perform partial aggregation, then do that
-	 * unconditionally.
-	 */
-	*perform_partial_partitionwise_aggregation = (extra->is_partial_aggregation ||
-												  !group_by_has_partkey(input_rel,
-																		grouped_rel,
-																		extra->targetList,
-																		parse->groupClause));
-
-	/*
-	 * If we need to perform partial aggregation for every child but cannot
-	 * compute partial aggregates, no partitionwise grouping is possible.
-	 */
-	if (*perform_partial_partitionwise_aggregation &&
-		(extra->flags & GROUPING_CAN_PARTIAL_AGG) == 0)
-		return false;
-
-	return true;
-}
-
 /*
  * create_partitionwise_grouping_paths
  *
@@ -6951,25 +6916,22 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *partially_grouped_rel,
 									const AggClauseCosts *agg_costs,
 									grouping_sets_data *gd,
+									PartitionwiseAggregateType patype,
 									GroupPathExtraData *extra)
 {
-	int			nparts;
+	int			nparts = input_rel->nparts;
 	int			cnt_parts;
 	RelOptInfo **part_rels;
 	List	   *grouped_live_children = NIL;
 	List	   *partially_grouped_live_children = NIL;
 	PathTarget *target = extra->target;
 
-	nparts = input_rel->nparts;
-	part_rels = (RelOptInfo **) palloc(nparts * sizeof(RelOptInfo *));
-
-	/*
-	 * If partial partitionwise aggregation needs to be performed, then we
-	 * must have created a partially_grouped_rel already.
-	 */
-	Assert(!extra->perform_partial_partitionwise_aggregation ||
+	Assert(patype != PARTITIONWISE_AGGREGATE_NONE);
+	Assert(patype != PARTITIONWISE_AGGREGATE_PARTIAL ||
 		   partially_grouped_rel != NULL);
 
+	part_rels = (RelOptInfo **) palloc(nparts * sizeof(RelOptInfo *));
+
 	/*
 	 * For full aggregation or grouping, each partition produces a disjoint
 	 * groups which can simply be appended and thus we can say that the child
@@ -6977,7 +6939,7 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	 * the partitioning details for this grouped rel. In case of a partial
 	 * aggregation, this is not true.
 	 */
-	if (!extra->perform_partial_partitionwise_aggregation)
+	if (patype == PARTITIONWISE_AGGREGATE_FULL)
 	{
 		grouped_rel->part_scheme = input_rel->part_scheme;
 		grouped_rel->nparts = nparts;
@@ -7026,8 +6988,12 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 																 nappinfos,
 																 appinfos);
 
-		/* Is partially aggregated result expected from every child? */
-		child_extra.is_partial_aggregation = extra->perform_partial_partitionwise_aggregation;
+		/*
+		 * extra->patype was the value computed for our parent rel; patype
+		 * is the value for this relation.  For the child, our value is it's
+		 * parent rel's value.
+		 */
+		child_extra.patype = patype;
 
 		/*
 		 * Create grouping relation to hold fully aggregated grouping and/or
@@ -7046,17 +7012,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			continue;
 		}
 
-		/*
-		 * Check whether we can perform partitionwise grouping and/or
-		 * aggregation on this child grouped rel.
-		 */
-		if (can_partitionwise_grouping(root, child_input_rel,
-									   child_grouped_rel, &child_extra, gd,
-									   &child_extra.perform_partial_partitionwise_aggregation))
-			child_extra.flags |= GROUPING_CAN_PARTITIONWISE_AGG;
-		else
-			child_extra.flags &= ~GROUPING_CAN_PARTITIONWISE_AGG;
-
 		/*
 		 * Copy pathtarget from underneath scan/join as we are modifying it
 		 * and translate its Vars with respect to this appendrel.  We use
@@ -7090,7 +7045,7 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 													  child_partially_grouped_rel);
 		}
 
-		if (!extra->perform_partial_partitionwise_aggregation)
+		if (patype == PARTITIONWISE_AGGREGATE_FULL)
 		{
 			part_rels[cnt_parts] = child_grouped_rel;
 			grouped_live_children = lappend(grouped_live_children,
@@ -7134,7 +7089,7 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	 * Now, create append rel for all grouped children and stick them into the
 	 * grouped_rel.
 	 */
-	if (!extra->perform_partial_partitionwise_aggregation)
+	if (patype == PARTITIONWISE_AGGREGATE_FULL)
 		add_paths_to_append_rel(root, grouped_rel, grouped_live_children);
 }
 
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index a920b30def..f2883de1f2 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -2295,6 +2295,24 @@ typedef struct JoinPathExtraData
 	Relids		param_source_rels;
 } JoinPathExtraData;
 
+/*
+ * What kind of partitionwise aggregation is in use?
+ *
+ * PARTITIONWISE_AGGREGATE_NONE: Not used.
+ *
+ * PARTITIONWISE_AGGREGATE_FULL: Aggregate each partition separately, and
+ * append the results.
+ *
+ * PARTITIONWISE_AGGREGATE_PARTIAL: Partially aggregate each partition
+ * separately, append the results, and then finalize aggregation.
+ */
+typedef enum
+{
+	PARTITIONWISE_AGGREGATE_NONE,
+	PARTITIONWISE_AGGREGATE_FULL,
+	PARTITIONWISE_AGGREGATE_PARTIAL
+} PartitionwiseAggregateType;
+
 /*
  * Struct for extra information passed to subroutines of create_grouping_paths
  *
@@ -2307,9 +2325,7 @@ typedef struct JoinPathExtraData
  * target_parallel_safe is true if target is parallel safe.
  * havingQual gives list of quals to be applied post aggregation.
  * targetList gives list of columns to be projected.
- * is_partial_aggregation is true if doing partial aggregation.
- * perform_partial_partitionwise_aggregation is true if child needs to perform
- * 		partial partitionwise aggregation.
+ * patype is the type of partitionwise aggregation that is being performed.
  */
 typedef struct
 {
@@ -2327,9 +2343,7 @@ typedef struct
 	bool		target_parallel_safe;
 	Node	   *havingQual;
 	List	   *targetList;
-
-	bool		is_partial_aggregation;
-	bool		perform_partial_partitionwise_aggregation;
+	PartitionwiseAggregateType patype;
 } GroupPathExtraData;
 
 /*
-- 
2.14.3 (Apple Git-98)

#137

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#136)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 22, 2018 at 3:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Is there a good reason not to use input_rel->relids as the input to
fetch_upper_rel() in all cases, rather than just at subordinate
levels?

That would simplify some code in these patches. We have set
upper_rel->relids to NULL for non-other upper relation since Tom
expected to use relids to mean something other than scan/join relids.
With these patch-sets for grouping rels we are using upper_rel->relids
to the relids of underlying scan/join relation. So it does make sense
to set relids to input_rel->relids for all the grouping rels whether
"other" or non-"other" grouping rels.

But with this change, we have to change all the existing code to pass
input_rel->relids to fetch_upper_rel(). If we don't do that or in
future somebody calls that function with relids = NULL we will produce
two relations which are supposed to do the same thing but have
different relids set. That's because fetch_upper_rel() creates a
relation if one does not exist whether or not the caller intends to
create one. We should probably create two functions 1. to build an
upper relation and 2. to search for it similar to what we have done
for join relations and base relation. The other possibility is to pass
a flag to fetch_upper_rel() indicating whether a caller intends to
create a new relation when one doesn't exist. With this design a
caller can be sure that an upper relation will not be created when it
wants to just fetch an existing relation (and error out/assert if it
doesn't find one.).

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#138

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#136)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 22, 2018 at 3:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 21, 2018 at 11:33 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Let me try to explain this:
1. GROUPING_CAN_PARTITIONWISE_AGG
2. extra->is_partial_aggregation
3. extra->perform_partial_partitionwise_aggregation

Please find attached an incremental patch that attempts to refactor
this logic into a simpler form. What I've done is merged all three of
the above Booleans into a single state variable called 'patype', which
can be one of PARTITIONWISE_AGGREGATE_NONE,
PARTITIONWISE_AGGREGATE_FULL, and PARTITIONWISE_AGGREGATE_PARTIAL.
When create_ordinary_grouping_paths() is called, extra.patype is the
value for the parent relation; that function computes a new value and
passes it down to create_partitionwise_grouping_paths(), which inserts
into the new 'extra' structure for the child.

Essentially, in your system, extra->is_partial_aggregation and
extra->perform_partial_partitionwise_aggregation both corresponded to
whether or not patype == PARTITIONWISE_AGGREGATE_PARTIAL, but the
former indicated whether the *parent* was doing partition-wise
aggregation (and thus we needed to generate only partial paths) while
the latter indicated whether the *current* relation was doing
partition-wise aggregation (and thus we needed to force creation of
partially_grouped_rel). This took me a long time to understand
because of the way the fields were named; they didn't indicate that
one was for the parent and one for the current relation. Meanwhile,
GROUPING_CAN_PARTITIONWISE_AGG indicated whether partition-wise
aggregate should be tried at all for the current relation; there was
no analogous indicator for the parent relation because we can't be
processing a child at all if the parent didn't decide to do
partition-wise aggregation. So to know what was happening for the
current relation you had to look at GROUPING_CAN_PARTITIONWISE_AGG +
extra->perform_partial_partitionwise_aggregation, and to know what was
happening for the parent relation you just looked at
extra->is_partial_aggregation. With this proposed refactoring patch,
there's just one patype value at each level, which at least to me
seems simpler. I tried to improve the comments somewhat, too.

Leeks cleaner now. Thanks for refactoring it.

I have merged these changes and flatten all previuos changes into the main
patch.

You have some long lines that it would be good to break, like this:

child_extra.targetList = (List *) adjust_appendrel_attrs(root,

(Node *) extra->targetList,

nappinfos,

appinfos);

If you put a newline after (List *), the formatting will come out
nicer -- it will fit within 80 columns. Please go through the patches
and make these kinds of changes for lines over 80 columns where
possible.

OK. Done.
I was under impression that it is not good to split the first line. What if
in split version, while applying any new patch or in Merge process
something gets inserted between them? That will result into something
unexpected. But I am not sure if that's ever a possibility.

I guess we'd better move the GROUPING_CAN_* constants to a header
file, if they're going to be exposed through GroupPathExtraData. That
can go in some refactoring patch.

Yep. Moved that into 0003 where we create GroupPathExtraData.

Is there a good reason not to use input_rel->relids as the input to
fetch_upper_rel() in all cases, rather than just at subordinate
levels?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

partition-wise-agg-v23.tar.gzapplication/gzip; name=partition-wise-agg-v23.tar.gzDownload

����Z�<kw��������c�0����8������l��{8Bj@!I��<��VUw���������J��]]�wu�����������k�t�=��W/y����M���f;���Wz�0�}���j�]��}�����XF�2���,�z����K�E9��E�-�Xv��8pi���\�E��������X�ma��Hs�8h�O{�_��u+��������m��W��W��9���`�������'��	w�����}����>��\����}��;�`z���'�?3�i5s���h��f�����	��l��k��<\� e��e���o�������d���6m}�t��c��]�h��v�n9����	������{�����h2M�D�)�dB4Y0M�����������l��m�V��
�!��Q�P`��&&�i,?��&{�������r
�=*�ky^�����!�Z�]��DQ(`���}������"k���=s=��~������'4X%��Z<�b\��?B��������
��j���C�hl����������h�z����e�?���F������9��(,z[���x�kF��A�����'��Z�q'�iS�o�h+��[
����?��9h��I�50������n���jp������L3:�f�5�K����6h�w��/�I���[G��)���o<14��YS@�a��7��-	�v�:�3%(A��t�=8l�U�N��3�ZF�,�������?l#���Hi�Hh�(����'��(�����{�����qts��Q�<�{��������i���6GR�G�/y�|��f���	?��1JH�IA�2^���wc���h�p@��EG����y���{<`w���$�Yk�j�n�vO��[���'�'B����h:��h,���\�`�����[���r\�u��}i	G�"������E��#��=�#�p8�:�q��'c{nK(��@�I WL�#!&�Aj��D��������E����eX`�~�8��Y��$���0��f�Nz2b��p��!�[.Zk��{��������,G����98��$PEg��>a\@&b<RD!0��j�:�<Bp�4-{V��&��a�c��8�y��~��=���h9���
��9vx���_=�#7�):y������ ����
��py)gK��\,��p�R���g�%�$�\�Y�t	�x:A�,8`���DK����6��p��&_�{d�s'�N��u�(%I����C�`O�K��&O��iR�F)T��G>c!�-�I���?��=RV��b�CF�J�[����-)�M�`B���=��]!����G�Z� �.*���)Xgi+[��P���l�T�lA��"�y �!J���*�����c�+� ����&��`����	���"Yp�����/cXg�T�%�t"��,\yB���Z�3Z�rJ.#����G�#klK����be��>�Z����*��������3�
�-��ZE|c�%���]i�����m���!O��	��a 2 ���(7�U�I��c
`��8�x��!*`���(y��\Y���3�;5�9�J�Q������\)��ys��2cc�����X��;P4��!��>&E��6Y���h�� VssZ����PIn������F�"p��"��5�.'��
����
���U!b� ����Y�����#�^�A\/._BT�!�^u�E���1E���h�K,�:^��~�3�bO ��!>�0@��.���a���3+a>^>��������m���4o�^������<|"��#6
�%�
y�}�K�I p�2�@n��3�ahU]m�c�"���j��	V��;��	>�@������J�x�L�������I�!`=7��%����|~!='�h��j���Bj��CNo��e�b;�M#I��v��3��]��g���r��y~\B�����=IT2����s��'a���N-{�m�'�_c��a�R�1����4M��6S�tRc��ZcO�^E��XM���htKkS��!g����\,}��93�y���X��gb������R0esw��J20j�,l]���2��Y�4����{e���j%�����2��K��fYj
���K�de�Ke��6+��s~U4���k%�����3l�P(s�{�->'�*�Q�3�%�*��$��r��I�TL�Rw�6�"rA�����:-�V��.�f�������:������
o���`��k���M����w��v����������������vl����5�n���;��~����mX�������4�hg���}p��W��
��+�D���hR��DSX���]����"Q�^���~r���N�|��]K�,,���:8��1�R9��;�e�v�`i�}M�X{0�JK��Q���U4)��� h�	���[��0�{@�"���#A�>k��i#bR�����w+�������jy��+V���_\�E�V���/_��{T�o���)���l4r���p��b���L�O%s����kZ�� ��C.Ql����]�p��*�Z2k6D���H��G�@k�����C0a���WA�*�XC���W	�
#2T������������� Vl\PsX��F2��vx�6�a�U-V���N�s�V�H��t���ZQG:�����U��M�K���w[��w�����O���I��.R��FlNH���R���j*-��	�n����\C��@j`9��>hk��n�?��N��k�F@��S�J�m��}�&���"�J���fx{;����������H�)u"�1N��j����[N�Q/@�p~����+�1�5~�g���B&�v'���uY�2E��n�?�����t[�675�&���jo�U�1�.�$p�*�Q���v]����CC���W�|#���d�,|�NC������h��	xT����8��5�$8��������T<A���l�~[G�-����������C�2X���F��T6'��2y�<��������<�w^�V��`vj��jY����*��`&	=���Y,��)_�����y��&���n6�G@�|��
#��4:;�����_�^�
���
�r�L9���t�����C*���1�J�W����7��+����uL���u�������������$�:�b�g���W�{!��a����co�*!��&�
��GTo�����P�s)�����`�0���$�J$�k�"�
��6���m�&���;�b#VK�KS(4f�h�����a������co�����L�����sE�q���s���8����s{c�c�y���}������|��r���������3\.L��yeS�mVi�����i��1|$���ER���?p�(+V*�!G00��n�'�����x�]2���0S�
�J��~������;�B&a=����-p��KQB�j�������2vx�����QV!��A���@�c�A�"�7N���B�/7����u>a���h�%O@P�XR��FE.M������t��6���U+��r:+��-�
2�R-D�(���vO��)��/��o�'e�L�U���v��,����NA�1��S�*�o�GU�.�@�fA������UWY�'M7�������I��q���m�R�u��1��RTy�DQT�$�~7���|7B��kKO����v��(vQN4�M�z.���>�6�Dq�KC����������i����W�����l��m��c���c��vg0�X���aL�V����������9�[^��a��^41dD��F@4��\���~��M5�j�k�����b�a�"�
�<�1���J�?��4t�G���T.�;��*�d��R/�f�������j|r�g�����������d���,�N`S�#���#;��=3��V\__A��w���&������j]�
�a�X�hp��F�������G���N}�]k��)���Y��G1]��N��T����a��H�����R:�q���Y�X���;�I7_)�����8	����4������������{�-�����|�1}�_����{V���c}'��;'�0��O����D�]�����n�w�{�}�G���wO R�����������k��Jg����!�?��r�A�
Y�[k6�X|��������������gv{z�nX�0@*2Z��~y�]zf�Pc�����Ww��{vqu�[_��e��:�&}��#���������1$�?�O�>LwyT�A��2dlB(X]�;`��\��������O�y�f>�/�Og�-�B��J;��u�p���vx��Jr{�eb*u�������������iXn���7����[|c���]���;V�x�c�~f����U"bl��.11v��L�@��-*�D��;*���������j[�T�I�����!?�#�bhNlJ��3	���.���]~7��E�0c�l�h���� �����$u�V����<��b�����;���� ���!}^���>}������RW6$���:YS��:k���o�
p�VI�������P�*��j��el}��wA����}��Nr�lYtQ{��{+��*2ep}�g�Lq��D�'�I��s�����r�/�`V���<\�x�Y!^:��� \�B�u����N�Q?��@ci���0���Y�Z&?W4?GRW$������U��	KD�]_,��k�r(	5��
�_��]!+�_T�,Lo.~X�$�RF��42��v���D��O�0���WtyE�\�+���s��9F��z/��g3����qx\������E6�!�a���|������T�j���������c.����a��[��X/r
2��[���ykpM������-���[�''��lG��bT9<q���\ve^�]�y')����79����*�\��W,��	�W�[bw?\��T���
Ak�{�]H;*w�yG[x��$6h��DI2�$�G/Hj���6
�6v#����$u��LjC����6^��f��|!�K��I&~1�
s�/��r>.���=Yi.F-	X����<%[�!�N-*�O��)�+a4b!����_ ez�+�"'N���-�����9e4Q!$A>����!_���
6a{���w��>�ju~V���u��Z��G��|m��5�l6'���-v�5�����(���V�1��V���=��b��g��~a$��������4�~�m�%U������S�C?px�=�j�;��������=h����s�u3��<3��cA+�.�(Uc���&��C�)-�H,;��s�W�zED��h	����4���Zs���?�p�^���@�Y
��U�{���.%7|���U>\�m��I	Z=��B���0D�~KM6	�K�AJ��V���v3=w�X�u&|�|��c�y2�KS��y�x\���Md>|��)��\� ��I#���>��}^������n��u�F$2���N�*A%��otTm���1U4�q2��jc�?�����hocU�+"Uh�SI�X��@@��V�������	�)��Y�E����5�gQSj{P�/�0�a�~������l���\^���.���6�W���K��Y'���}>Aj�
���m��q���D5�FS�W�|������&��5�>7�Z�s�r�������2%��79�^<�6�Qo�h�������u����XD��P@���,�8�wtz ~t��?������/�`P�
5�b��|-9!C!{�xL���k3,����;b���'�����t��G�l��K�������'����$-��v��Y�����1i}������:�@�(]�|h�z��?-Bb3���O���o��A�|��C��/��W�uw�����+��Yw��]�UDc��TZ�������)��z12�=%�H����v��#��W�W�{�M�X�E��8�L:qw{6�dc����w�%Q1'�����3��/��&�z�Iz��m��BU����f����&9P�0(�$);s�L�WJ������\��c�DJd���A"���N��WxfQv#Qv{v_�kY_�,�F�w���G�s<7o#|W�����9\:
�M�kLZ��F����wp ����!�Y�cG�������}��;;@B���q�A������FI8:V	���)�$�� ��
A�5gs�/��N��XF����~0o��bc�Bp/��-H�QtB� ���`.B�0��"�?.T�9�	����m�*��!��<���O�M2���`�U>4���l�M�O���e�2��l�8��\�CH�'����<���p����s!+2��3��|@���j�'��gg��[��lr���>s`�g��R�BI��S*���
�X�[���FJ��lz�6������<Q1�3T?h�f���#Z�M\R$:cH��',;�L��.=i>N=Wg�H�f�GcX�t���?��xC]�u)��5�������������mo��.��F,9��7�\��8
��+/V��dX����d4 d��2�~�^y2���;�v�::�k3��O��j������U��C��A�'���b �$$�
k��"K���_�[[[�������z}8n�����u�����g�k��������8o�y@�T����s�8�����yC���_r�>��G�#���
���a��JC�3}xV������~��{�?�
<�|��j�/�o�����<��[�������Gu���x����>|1r�d�?s����A�[���S���t�;�s�0�S�oq�nYL�Y�C���Q��>.]iacS�A���u�����|\S8�����/��"�		��0I��l���B�sS���������~A?���fYkx�--���E���p�s��u���6���5�o�����y
���v�����yFc�*4f��$��
&/P���gb����,#�(�3)��5hxr2�6(����h��cbf>Hxrk�s�)���UJ&�E	�6x�y.�8�gL�2�M{��D{.R�A�/�Y| 9C�)mv��l�g
V��$%��Q��������X��p$�{��}]#�;�iZ� ���%��0��1��C�Tj_����f
�L^��H� ���[-�-��w�(����X&��z
1�
�Y\b�����r�5U$�V�<��A�.V�$ZhG�!��&;�zMP�����(>HuwxPA�����S����R���nr���F�����zK�f��������w:
���Zcf�6Z3��u�6�xZ� C�J{����5i ��$�o��`)���|-HB4Jx��E�(+������!��N^������
�Y���`BVl�!B��T�H��=f�YY�HT�022bV8�`��q�>�E�Uh����n�z����G���E3�qr�]z�|b�i�>���-;��������~6����$�/�I��M�D�w���)�N�X���
�,1�"mpv�	�������z��h������{�&&�c�h��~8����39=�[;~aO7l��������+��*��FZw�t!������K��3V����k�D%�4$Sf���V�=f4����:k$���y?U!�����!���2�%�|V��",�w����X�E���������Q2�s���z��I1g" q�I����a����������v,�XZf�U�2���	�[�6��V�i��
%��M�k:����#������C��x����3x��P���*k�N}=�-�������!l7�����2\B5at��(�M�P�v�!T��3j�*����d;;R�~�A����A���������,���F��h��f��W�d���Y�k�;�Z�m�l��C��F�{�^��D3;�9~����z���������Q{A>k�fq��A1��K���Ns]���6���X�h�N�QK]�eD|��g��;=z�tNFI	Q��D0J���BwfR�(�R����:G47�F��.��Z�;a:�����j(44�^��9�������������ro�����	X���5�{AZ�x^2^�����z�.�����q��5�^>M��p*A������2[�L:��YF���a������v��t�6��O0�Wv|�w
�'�(�-�P��E�(��4;GP��h�����v.����5�6�]��y&�6�����\>h=���t��~��E�Z}�H�z5��������X�
��m�\�����8�
z�Q��������9�Y���b�,\�aS��?�~������>\	��>zY�O�2�vP�Z�JbZ)D+�g��,�eP�-��#�kq�aw���1E��lh�13�@*x�!v,d��d����7�^��������������o5{^gt��h����V���:��7l��A0��V���y
9����������-��;���v��Qs
!��tJ�4���kn�d��`���&5�T:auT������!�P����A4#	��){,������9��"�(����gJlgvo����?q#N%�����3��K
3mmL\4�����S(�	�
ef��$&��\��BV]�-����u�f[������it,t���`�5�����	�����f��?�>G��z�&h��{��`M^&��e�\�:��xY
F��T���96��P-�q0� ,�V��f����i��-���4��$'2<Ee��d������2�'���#0��������>e�'� �3�b��2����\x��[�������{��  �cL2�%�;	�Q�����G��+����_����"�����*�x �����BIt������y��>w��2��O�FP���1�p��sD��|N��s{�6�:a�w�0�Ka�����S�g��T��y�t�X�O+sK����{8�8��7���(V�i�'}�9����//�*@�K@��N��e������#�@8��J�������Tg�Rh�m����s��#K�b�:���l�Qo�&���)
�$Z��$��U���M����?;��O������+Z[��V��6lA�[���1o��a
�8l,��H^/��Y���w���8�����R%������������b���T�x��2�T���-�A�����d6�t{F���O��(���,:�����E���l!F���������oeRf�@�\ql�E;�]��u�UG��h�
�U��V>�e
��9+���tF�<���M'��p2���N��uZ<������f�a	����4D�(A��5����)0zBA���5i��@{=L�JiN{�����N��m�������K�����L=�V�5I�}n�m��*��7�G�!�gZC	\;�)�b��r�E�S�SM�X^�"��de�C����N�a��W\����������B.NG�d��f��7F����1D��H@�x"��b���|[�h?�����O�av���dr�Q����4�d��(M�*�3���j�`u�Yk6��l��)(
b���c6j��K����/�.�3���D�����U��:�~�6&<���=��3zS���|�wZ�`�`�b�8�s$�	�52	@95�,��$�$���\�F���!��'@���N���r��%�^�����,I5����&�"����YK�kMz�_���
lmh�V��1��lV��oT�^\fp�dD�|�2�V@�D���#I��!Q>6ED|����C��cp`���	t�/�N�������
��8�����8����&�
beYY~e��k}��L��l�pJ�d|�+RN�?A�Z@���|�������D�������wR�V�U���!��7zmLm�it:�����n�����-/�w]k=����"�����PrdJ��������d�;�9�f8+@������?��0�x���V�C��^��m��v`;�����$��hm\]n4Jmy��b���)A���D�����B�U/��up[B_�:v��\�@1xB))7���k��(#h�5�����PIH��&KX��_�^=��5�f�����1�iE6���8e�f-38k��{w��!����=4�3��� ���'�[�ZzM&�j�%�y&y���d�,���X��i&ou�5}#0K����_��&�E"�/��I0S�Fb��M�&
�N�����;�_C$)N����i�r�����\�{�k4�)S�_��������=|�dz���h}Ye���2��0����<3���"
�\�;�
���w�v>��y7�cKL��~C|7;��:8�	K�Zt05��$��]����Y��}T�a��K����*8���1��_���3�
��g0q�O��c",�RnQ�	\��<���H��nt�Va'���-S:������C2o5:���<5�I���E��j����3�iB/��p$[MYZ����!�� �������kt��a�.���z����������X�e��n��}��j���
'x���!�)� [~��)���=�S9d*D�,"�Q9Q�w�%����� �g��������������9@6N ������`W�������ya���C��w��OG��'�h�>yG���;��'�	�
}��6����s{���=����'�e�
|��m�>y���<*��st� �^�S���������U����L�����9��R a�S<����}��`��m����^��o&��MW>�NcV6��k,xj:-�<��<����W8bu�0�!G�6=hR��O�d���@A��}�	@��(d�2���Gm�9�Hf��/�b.4��)���am�E�H�����R����=����ve��,�@�h	���+�$E���<a�@�M��B����,������Z�oO��"N#���E�^�e9�"��b+wy�X�hjy���w�N��S�����db�J��%���-���&������
Oat�Hx/L�"�����������U4�D��(\0�$�As�j&3�H�2,t���P.v�!/��P���a�jJ�Y:�l�����:1(l"BZ����BS�u:l�Wd���A������^c���?-���\N��O�7W���&W�^W���������T�b55~����j6,*|�|3w�g�?��CrLO��pv��z��O������f�m���6��!��>�9������ic|�k�^/l�As4�:�?��{A���N�w��5��������w��E�'�&2W��zE��Akz���3����z"{���u��1^��	������ri+*��>uG|rd����]yg������:�5@ ����	�x+���u�A�����?.��)hk/���F�Y�%�p�
l���O��>k�E�}�����w����(�|2[M&:������9�j��+��g�D.|U{�_���`�l-R.l�������
�z~j\��L6������oX���:������K$Pi@�-�:k�A��i6]y����Uu�S �D�(h����|{������9��(#�o��������y����"����N������j6�2@z
 ��j�5�����UM���hO�����Dg�*1�]��x5��5H�.���`�D)(I1���5^C��`HD����D�P��1�Om�B]p�4]U��?=��B'�6���B����]��%N�A�#\?��EjYt�g�!H��?@6h:��B^������$�/����.��g�M���4;��c������jJqmx����#����DW��}�~���;���}����m�8�f��JX0Ai�%����E��5��^���f/�������V4�+�92����RH���������o�o���b�<��~����<��f�|��f�k����_���h�
�����B�n�y6�y�4���!�Y�(�)W������Wh���E���+�����=t�0g�&[�+�R?qnq-=����|h�S����7Sin1���l�?��H�<N�v�4���xF���^�s �Q�z���3�?b�t�4��{�oB��~/���^-�k����4����4����4����4����$4�t�:+��N��<�O������D�E�������L��E#�y2������"����|�m~������i��Y�m��Q�������lw��a�E��
f
kN�5�9��`�����w�Y��5������l�o#[�L�����C=�T��n��N�%�)�a��=���<�J�Xd��b��cdg��oi/7���i(�>������V��^��K��!�V�=��C�����������EBU`����64����T��������E��O��"��V+�G
����=�����v��l���Q�o�v������o���Q�o�v������6��NRJ�6�lb�D������&:�M������W�QW��Hw���%�wD�����\%ve-����9�r��f��h��"��,�=d�"���/����"�C-�����cg�U����1����������q��A�������������Q��V1���24y�%b��.��@����C����qH:r�a�]O��]B�V]��D��P�^�yYxe�e�������E�Wv^�+D�����<�F���c\~��9���W�}q7����i�\6f_����Q<�����V�R@�xS�^��$r7n����Q��V,���+�\�����}xl�%�����
Vh���G���NVh����:l�_��
�"�6�5`'�pn�������yV�����y���ZNP��RC���f9�E<������)C���N��Qt�l�0��si
Y���mrU����f}��8��W@ ����z!��
�cJ�F������6����f�IL%�e������P�.����&��Y���TxT�c�b����$�(nBHQ����rf��)����8\����n��W��������q��7���G-��3��%��|�.sQ�J�q��3�	 �i�b�������p	�t
2����f�`�V����� �����Q��!�!1���X��$f��j���`�E���7�7�h��;�p�AZgG��W�ea�kKx���o:��:�=g���G"W��9@'�7�#��2����s�s�/Z���xM�x~1����NX����w��2\	X�NXY�3ae�����&���s7�X��m<�+�?^�
�Y��_ >t��o���Z;5�mX�����C.Q��Vs���X�0������W���F( ���Ox������: ����g#z5����^���H �y�^��I��r�x����l�:}o+��8�;�����F�>`����>��m
#��0�]�Y$�tOYw��%Q���0�4X�F@�F'T��Z��g,��k>����x�c�����6)9{�r��%���SX������A�q�wL��;��2@��}���?� ��[�9:��Ic��;pA/������Z�.084yn{��W��Q7���5}� �y�-M�c8�	�=�;��-]�m['.��j�
|�rFR���!��L}c�e�l�����24�~�M�r9m"�l[�Xj��7�'�,`�j��>�/�mU�6�����l������[o`96�Y�,�6��7P��{�g����A��u���(�K�7���[Q����������G��)���7�?^�|������}�������T�	O�8_����J]P�R��X�,I��o{�D-�]
j6� 1��P0aF �%�ida���@�I@�X95E
I��[ ~t�wY��%y����@�*'B�C|���4J6���B������c��8�\z��M8^�@�����'{��F�{I:�+a�
��b[hrQ`P,	��A����0��B5��#�>D�7��3���?�r�Ac�X��Z�����"���|V���ZV]��kUum*���w*Z����m;�������$����`_��.�m_�P��]m,������l�QvW��M��u3��a[2��5����vP��nL�����k^����hPI��{)rON�s�����:��s�%�]�<�����/|�	��s���n�nn%�$0�����&�����M`���bX�jP$
���p�)K�+�9�m����bX�G�)o�?���~��m9��k���`�=��5,Q��u�Q���V�����"�\��k��D�#��S����(v&�j��[]�R,�\�
"�P�1�U���4^0��-yF�U��:�����i���t���
�Z�!N�����Rn��Xw�j�������
V��;�}\���<oK���F�i9��n+D�*���m���;-���r��C&{����*�y��;����q_�3�t;��l������8[ot�p�=m�K)��r*���m���K\�c���?���9$+���g�Dpi�	�G�u��V�Mfd��{��h�u{��^��b�Xob���o�V�{��W���d�d3�K���%���Y�Y�����u���]���+��������3�W3����f����6�Z����P.P#��eds����&r��WzPk�J�HM��e�"59)[X��%�����K\a�S\)��(��cDs7���R��f�z�]�O>�MK���W��c��ga`�6U39��t�\c
]���Ct���$����H��!�8+\�PYBZ�����#/������+���NA��HW)������W��+���`:�C����-(�<�����&BH����&�4[���9&������?t9�:#������5"��k$'B�u�������8�����������h<b�F�Q���%l������]�O��V�b��kl��.��:=��J������������������+SGrn|>�����\~n�2����_��ff��X��L����t�\���*�9�*���li�=���9��247�����+Cs#��9L���~�1��c��6�Y�l4\���
�9�@6E��6������nt�wM��c�S�H�* G$=)rB�����1����AQ�]�zM~vM~fM��k��[����)B�^S3����kjn���r����d�9E9F������X=����!���WvD���~���#�`��T*E�w����9np����R�����\�x��m����C|���d>%K�w�8��8Jib�X�0���VXp�h`������|�=������z<L?�"~D��!�57+��j�����M �,�Hsw��y��#y�!���Mxe�;_�?<��E��j��m�m4�����Fc��������,u}<	?��od�I�M��;�����t�:��9��I��t�M�R�����05'�����P4�S?q�&������/�k����c����c7ql��k*���R����T~���\9N�q1���u����N�A�L�U(=qi�	���nR�V�a8�v��U3��m�^�wp�������W?kcg��D�v��E�n�����������`QE�;E�AD<�U�;5��3�@����m���Z�B��g�u��H�$��D��VF������VK�,#:���=�0�fK=F�,��OB�:����������":������;q�(�������E��n�w	N~-�*�WVI���$��-��lS]�S985��=^|
��9	f�p�������"�L(��X��������((�
��W*Y�r����T��&����*����a�]c�}jn�O�{���	����O�=�S�N�i�M�����.v��l=�{eD��L����,������R��wx�54���@Z4�Xs\������+|�W���-~�[�����i���?�;�$��������@f[���i��TA|�M�so���������3��iC&�;{/ G����Lb76�	����;����,zp/d��0�������x�f�.�Q}�Wb�7���_�]�a�����W����������Z�&����;��K��9��{�aq�����A��&r;�������L��z�����gW�����}�_	�������;��P�	=���w���;t��E��r3���m�_�}�v���im�X���~S�z�/��3
?II���S��ST/�s�o^@��`9��J����Q�����Q7��`[)�����D�;�{�}���������6w�hw��Dw�(�e�0g��nH5^/���> �Yf&�����gM�|�Y�|<��1�Ar����Bc[���Z|a��<{�B(#(�`7�)q���`����{w����F]�����l��[�����[�v+s3�=�d��V��*S|i����������.���B�z!�f53�=�d��}�hs_;���h����Mw���|��0�I�7���r����9�~��[��g���������o�k}f��}V���5MvQ2E���=Z�������?���7������Gf9�z��	�i4SM�X�)�(�O��D�=�}��1o+�%�r5���J�4�5��o�� >�IEw!��0�����
�kr:�,�/=�]z��[=Y^�*]�v��e�K:.yx:{m�-���!�2�}�T����)�7\�7e2�ZNEI7���T��QY���;[��O����Y��;�C�_9�k�o���{[�5�`S�|7�m���2��mjw7���M���n�����>w�y��5|9����U���~�G�����6�:b�r��9��	m7WxYNON ��	w0����A��
��+��`X���f���r�=`7Cb	�2>��]��;�P��XV���fx,�}a�r��=!�#$�~���$d�KC�r4"��F�	�	������e)mr<��u�������-Y!cn����p���&�������j6��OT�"|j"XM��I|C����]�Y�d��FH*]vR��2h���dCB��P�G����ta�8I�|��?�?��}�ju���1�$	�!o1��D3)�
e�g-�e�e��m)d�����NzSxDlb�3&QZ���$�\���p&���[�g��G�����,��[�Yb��r����R-��JBo�Df�h������we��������l�+�Y�*���SzR��[`#�um�<c��M�q�rZ�:_�Mo�k�W���H'��JJ����:��X�iG5�|������;�@�`�_F�h[Dz�/�e,��s�V����V�.���
�WG���^W�)+O3�]S��l���2���Ym&:��	u#��u�zeG������/9b*���E��3����\3t���r[]�:���Zt��-�������-o���vo�r]�[��<�r��|9v�W|�k`���9�����}�����}���.���a���k�lP��i���^��\<{-��+?�Wz����A&]r|�1Z��$YO ��	E�E�$'��O/���$$��*�l��`t�j��������6��N�U��*�����Q��������:�.9b_�ty�Z�!e�!�D�����o����O�L���O(�>���lH��"������W�~���B�L4�D�zI������4����I��1�P4
gTX+y���*����jB��x�2_����h��r�����R�W!0{P�-�r}�\�c�|��|��j��'kf�fk��
uy�MB^�~S���#��H8�� �RM8�����u���[m��
O�-=�����z�Z����h�fB]�]_<�{ ����#6H���2��V��&'�q�pl�v7���2���L�Q(������z}�����2
?��V���9/x7j
���ShY6���/����O/��U�x�z��X;T@����
�J����K�]�J�����q��,A�E	tq}i�C��!��@��\��3���o�&��%`���D:�$���U�i��_##�vr����w�u����o3U�c|���6~������������������o���ci�mZ_]��s�o3����l����H��.�w�*f�S"^
b���ip�;C��f�K{#�����K�R���Yf�`����W����VD��L���L�,���0%�6yi�����������,{���}��6	���2X|��H�KO^XTZ���h6_-Uv�f�A�N��N�%	E���l���5�X�{�~�ObH{|�x�iX{*��(n�Y^�Z3���H1���!��H�]������������	pw��.�>�V��Q�o*��������s7�zj�������t�}8��_������}-�Z#�2��'[�U9�E<���P��`1|x���|�;���������o�!�9CsPA1u��^�)b@��k����*��y���"��%j�8��Lm3B��/q�i�)D�e��EqD�9�M)v�n��}�%��D����}~=�.h1m��@��������5ck@������?�{h��SY�	��x� q��$[e�X0���wt��V�b
��/o�"�y�tW���(���M�p%<�<���2��A��/���9�	-h�NhY�+-�	-�xih������V�?�nm��_���k����u�r���Hv{��/�C	N�|�\��G�4`,}��?��~}����2���A*[q.���S��SX�g�U��1��V���$�
��o'�����n�����&/�����_z&yb��xs�������~�:���7o��_��#p��+���x�R�x�&���8��|l���S�}2H|*�&��<T��O`@
8�^D���x�y����~@(c���!.����Wg��6&��N����{;����}�WM�^�����`���@p�����S����{Y�|��LS
$E��S�@��E��zI�����=����N�@�/�, xMR^��?I���[z��R'�.������������M�N�'�k)��,~�E���i��
���H�X���1�"��"�\���4w���8s>�/�h,W�eQ���$�����
�Hk(y�bbE�����sJ_�}���7��[s������H���:c�r*)����K[���hpx���WwG�M-������� ����,�����F�b���2��|4���W����6�F���q�O����"�X���A$�`���<��o���w8�~�I��
f��x��2�t��[����	3,���b<n
�e,����/�����������X���s�t#8fi�%��p`��`�q����7�N�s��$�dO�
�������m$��M��p�&���V8�{!��MW9�����9,�m�i[r�M�q��m������5z���k���\��P��]6$���p$+���fv��:2Q��.���������~�F�L���m�1t����3``��R��-��p�����M���L���2C���@���O��U:�����Q�5�X7p�����_���f�ou��"w9�����N�q��6/�����<P�
�@��7R��3�=�`���`{�����K�;���������v?o�#�����P�U�z���{���>�#����1���g�yq����$������y�5�:FNxL������w����i���m7�o��z~�O^�o������F��j6�D��U�����a�9������~~�8�����Fo0
{��4O���	[�`4z���5[�n���B��e8'^�4O�Pe���0O��F��u0��gd�!��W�G�b��4����E�5e O�������
~�����k=�<r�h7������p������W�~&������y�/(����+���^�\�Q�Jnr�H��$Mr��0�+�?�*�1�L����%�Oev�P���I��TB-zq�x�P.(D�L�c��K`S�6�� ��r�GW��W�U��	\x����E�w�~����\�L$���?EK�����4^��p�hl�#0�X����v�dt�<!w,2���������t�����?�����dD�	>5f�RF~'�x)��k4H���j����8�����q��P�u��q�5�S���R���nz=HsYY!FG�M������Q%�����|]+��E�v���n�G#����R�>��gA^�9b�5�z��m��QA=�*����j
��A�������#|]���gOWP���6
-uR����R�z��__��Y��L!6� _-���2�:�����:��w���f)��/�Y�"�" H$�N�%8�d�2�^-�������W��-��@	���xR��9(I�TK�x�9c�����0����u��=���	��G]��E�<��'�����sk��O���gN(�"2��z?J���s��m
�}f?���	W�v���#f��{��c~���6���m���?���V�h��L�dx�q5��������mNW�}~��a����a�:�
'�Qxt$9��~m�Sd�O�����}��7��t�E7#SQx;��7�$��s�$uPh(�����c�#��'�5=��8�c�,�&�U�$���Qr��v����Jd%�#z
�r��>8>��y��ge��I�nu��pP��G���������YJ'��������Z�����^��y�����v$����23
)Q���l5�'��������Z�U8(��x�eH�)��s���tk���0|�?������at����v�'� �<�>��~�B�J|FW3}����������v��P����Q��!���B�����2'�3*NN���_��l������1�|�I��tn��j�����O��
*���b0��p��8!+M
�$P��a4E$�
@����f+`W�!
���H��}T:��;\�Ib
�T �3].]~0y�wp5��dd?	�'AIJ���xx��is��CJMA����2b�.K
�K W���L�c�� ��������Q"�nI�q�#�'$	�roH^���P���&��ja���O�0�h��$���XA��p��@2�Y)���y� 	k�����^H�c=��Y���h,���)�r�$�����W�7{���3�fHt�2�9��,�
���4���R��!ob���:P87�:(�C2���R�QQ�����^����J�x�����J�`��%�)��y����'�3Z
��������T�8A�LY�HU����S��u������x���^�Hq��W# ����r��j�[���������Sv��FJ���J�g��C%��y�
��%�Z��+Gl62f��m��|T9b��f�K:��" ������3XEf/�p��o*}��E���1W:,���,�
pd@y�'��Q|33����]��'�]m��|v�'����t2A��C)��z�Qq�z=3��_^/�������!�fvje�G3�$��*��VSHj4���v�,b>�<���Lg��g��-�^��f6p{��EmGtk1_r�mt�U"�E����~t�J~ts
�A�(~��N�`M��W����A	i����V�5����������-�5�������R�8��u���?�������p��Hyl4��i�U�_��{d����3��NV���|�\�[���~>��*F�E8\-�@���$	���5�;/���|@?�|����`r����$��@�P��6wx�~����jF���/�Ko����d��p�n����c���;��s�Y5�����H���~��������>'b�f
�w��C�g���~�e���^�Sq��{	y�nSB�HO/fH���N.��O���cHY�"*����o�P��c
F��j"��4Uj��p��w'�g�����5w[�W��(
��J
*V0-
A��1��K)+�#�<L���x��W��a�5���Lw}�����E���r��G� �,���&4���],|�g��H��o@�s�����[������`�H�������0W��a�N��^Wwp]A�?���b_8���x��{���*Z��#�q{�_��������!�^c�����9��<zm��Q�f�#��:����=�}(�q�3��"5O����������`���P��<'U��!��?Qu2\������E�	��s�9���_����'dDv�g	��a���B$8���Q@����Ft�"���r.���p�V�i�����0N��*K�����"�}����)+h:�\����[tg���z�����*~���T(hmWj#��/�I�;���4S(�Q��y�w�w�S���|�pGS�����R�d�������,��Y+4��6�v:cFqB=����iz�q�^�6:~9�Iv�����f���@9�=���6�$^��C[��x�@��1J��$��$����x�W��������?���#l,�o��}����o��� >��>8�6U�^���^��7���Q�����)�`�h���!��� \�)q�/'a���<��"�'�B�>�b�iFh@_S���?a03�(�E�}��_�=�X/���q��l0q=g�z�aL\M�����-
e�p��XpL/�p�g�Q���E��*Y��(��uldxg3j�6�\\��]�|�����M5���rC����-;8�:~���TrP�9Pn��N��� �>�i��8�*C�C5��&�\�<s��l��w�{�a�^����*}��@�g[5���{p�8���v�3$~0��Z&iXN�<�����
�i6A}B��|��$��H�|�K���u~�\�E}�?�$����i�����1�v�K�*}�dL��K���7�>�&����
�����'�s����{�7�-�������sL���>�1�,�>��%�������!�v���;��x0���1=T�>�%��L���}�����
�70���D��6Ko��m����n���Z�Q.�_z�������?X����t��(�Q����7�r�w����T��0�~��w�f��w��&���Z��Z�j8Ix�B=;S
���m�-�KX���X����4��f5D#�cd�p�<���$�������b)L��h�Q����%pS��75G�P�!W�!��|��T�����S�E�`�N��I	CH��G��M\B�f#������*Q)����t"��2��G�;�bj�t��)��q����sU��)��%�%z��o��=?cW�81�����=�~�"z��#���6������f������m{5��09��������q��4M1���G�c�"C��k���X[E�����l�\Dw�c���^g)���&������p�O
;�dH�ryP_����|�H������Y�������"m��@��A��-f��P�~r�'LY8�
+�p�����9n�0/��3c��2��Jy"�,A���$�|��h�iK8j��(B4�����*]~���L�z����G��N����G���G�cjP�2��@�u1j�q2&q���<[��F���9�w���A�5�m�P��0�;�X�KC���7���I���s���^�e��kwj�&���vj^�����8m@
�ROS���L���,EB���0U)��+J�FoWS_a��SH�9����'�hO��lwr���_��
;bW�e<G����SI4���zT�i�th�����a�D�5W�S�d�����Ld���lSV����#$�e�����\s��5���@������G��I�����r��\������8���8�9��#�$f��#���3��/]������P���~?T�H���w���������?
���J��,>P~Fb�W@�F��������E�9B��9b���;�0w��1��D�7u)��:'>5�`����3��E��
���
�K!�2��"b':`�\�>�K3��0��gDV������d�{�F���`��#�$�L)���7��U�6�\}�����O�W/��q��o`�C8Z�����L���M�*'�@a��|�x���N&��!l7�dX4�
r�SID�v���K��{�n$����EX^�J0r,�� @w������\D��`/�b�
@�.U(R�	��.�B���o�6O�c&�����Mx>6���{�A>jL''��|l B�KI,���m����[-�s�]X@��8�!�&:����+T3&G��C�Bz���'�����a�|fU~*��IL����?��;0-`��?�9oT�M�0N�����I_��>�3!���g�I�H�(�-��p�IH���)���6���F�JD�RP�Cm��h��bG��(�!������M��x^&"Tu�.���6P���$��>H��&���V�y5�n;���:�kRaz�V�����+&���,�����
���*��
Lm�C G�M%70�����[1*��_�{�Pa�S�;Pk)E�=�6NQu�jx=f������*�fLq�Vu��I��EQp�RM%�^��q��-�pV�1�����e���@v�N���f��}����6I�UG��������,���������1���i%�`y����_,�@�&c������gCI�3P^R:���Y
'���9Rd��q���r�da�_���eO�B(nq����_k�`�O��nco[N��r2�n
(b������}|���Hv�dy�B���w
�R:��e�#�	�.�?���Q��c��	�������j6Z���P���;V.��kF�������$��f���\dG�n��^������*����#3���8jG��Mi�v/��F6�aS��>�5��k�����/gf6�:�����O���8�+���b��/���U���-x�[�x�6T*e���=IJ4�z���������pI�L���]`�!����^����//f�`�?:j��������4H�_	�.,�����RR�a%7��8M�����0�=���b���L	�\���:WK�)��N�_1��nS�F�U}c��&6�BA
�Nk�3Q�o�K�����U�
�4��+3p����-����E�]��KE$�D;��f5\B���Dl CA����qr[�X� �V�^5S��#(l��2��JAVj�-�(�:8��&�Y���9`X�^�D0p�p6��*X��Y������<�xd�I&����((�h,�6��X��c������x�l��*,�����&E�:Qx�Tsa�0J�Y3�P����n�5<��%L�5�������]���:j����r�g��������z��Lg�<�+�e;�����d�Z )���?3w�N���y>����n�oh��M$Na&��D����2-$��WR9�z���K�2���t@��V!F����P&�������X�R�S��cv��\w��
�MZL[�m�%��0�E/��q����� �r�����$R�Rlt��R�=!����04$�y�6P���Ok�m���_f�
>�+�����	���"���1=p6P�Z�,�d0L��,)�@�N�5b�X����9W���2��[.��%}6m��UP��
��\MIN4���3�Z���PD�����
����t���!m��UM:Y��L ��Z��{�K���R>F��W��EWV���[A\M�a1�h�c��Jx����\�^�p�Jms���|}��k]^��Y����K��l��!%!TN�K��k���<�������7W���(c\��_���PA8�$�H���	���'M�;l�B	���T��V��!�� h�z����{>�wq,��#b�1�
�����!��%}H{�!�i�9`��1d����;�N��u�eEX�j��O=�`�������6��D�-��V��}5�j#H[���LO�A�u	,�<.�6�����2�$��]Y:�&x���P;(�+#�	8U�\hz��#v����@��������K�}�{_6��������
���>��)X�5���o����m)�]�+���W����o�3�`W��B0��Ah�>�$�?a���Q��Y��
�$ �5��!�#�dS:���[*�B��*�B���	[r�B���{�C��e�&��N?.
�e�Q�
N��e����<�T��z�����Y�;���N�(��k�y���w�g;���5�s�6#8�.#8��+k+�_YY�HR(�kE<Mp3\o���;�N��wq�[E��������8x8N����#���^���J�A��K��J��~_fZ�Qcj$��E���T���8 �'0�V��
#�*�'��,32�$O%����Y���"��:��l�Am��H �
V���VW�X�*��6���J��y��L���
r�u�JjU� gbJ�Y�G����/�$1���65�:d�@��V��p�_����x
�����o�`��o������2`o�Q���[u�,�FCLXh|���s.���/=(�c�#$����A"�*�Z���bH�
6���]B'�O_2�K�9$�F�����}�W�@:�����d���]���3�f8c|����cL��]�@P�)�f�nk���s��U�426�P+3�A�hJy�m��[���@�R�H���G�'����!�O�����������_)�����Z����:	p^���"�7
�Ku�`8�P�~X�Vl�5�F#�M"�4"���Iu9�^�s�����[�G��S�uP1M��R���2f��*��kW%��P��_���X�������G]�����`���X5�C6xW9u���xWE�m�*g{��f�I������|������_�D�o&B�*p�W%�@���|&V��I�~H'����Y����[�?�N�D�$���C� ��������1������f���x��'%��������������������+���,�sP�4�pS9��!�����UX��+
A-O�����	�������������hj��"-w��8r�WP�-��u��6]QDR�I���K��Y�����\��kf���T8y$�]�AG���Q����r��?)���08q��"kFE,��{H���6r5�OELDO�X��N!��"�%X^�����w�U�Xq��G�?�s�l��=���^��#�s�8��1����v0)�U�2R����J|&b�e���&�t�!��%���`�W�|,Q/x���)��(X�G�t3�dy��o��1���sB���9i4�e�J��3���T��5m��>�����|X�a���i�.0� O�+DT������E4X�C(�����8��j5m�4X|�c�?�>#i���E�U(t�YZ����R�!\�B���N0y'�UM�R���F"$i�$PS���O�O/�P�J�py
���J��P��T��4��N�G*��������dy��s��0W ���g5^��7Ap��.���%�r{5�$SS�k.�<KS��+H�������.����W]�H��7^bws:G	�0���8N�1�!�:���9�-�@�<�����^UJ��w�YR0��)]*E����V�I<�C�� ..�F6#�.��-���z�g^3Eql��%\"��-J6uqg��0������r�k��n�k[��fV�RH�%�/D[�k
��"���c�����o�����a����_�\��R����3>�X��&k�`�[F��SWDs��'?G�ak�V\�"R?����Q%)����rD�D�2��wLv�Je�]S�jJT�`�H�c��	T��Hd���<G�[�6C��3����(y@>�1�	"J>�93T���8Af����80[��v�g��kTx�����q��O���Q���d�*\JM��6F!`L�4[�k4qb(��mo��k�x�����A�����7s���S[��F�w�V��h����3[�6��RR�(R!�����c�S��-�t�L�y4�bE)�)�l�f+�C���&�0�<���?�2�+T�y�n��5�E���}G��oJ�Bb��I��.��U������Yw�h(kb�"Y#�S�B>����0�,)CnA>�?|8��_��[SK�g���fT��O��$�T&���=��C�K���*�����^Ay��d-����{�Q���h�.�K�'�����
�#�V����!�a�A�*���k��e��s�?_�P�J%.y���Mwk�s�;m��)k�Q�G*��j�W�_tONXL�;��1M�_�VvD:���*�T���b.��>2A���X86�zU9�`VB�^��n�����9���=#P`d��GuYeD�v��|�K�@�NU$`EQ�G�����
^DSF@���~��1���b
���	�P�M�0����W�y�S�;F�t��Y��>��s�@���������A�H� �g^p �F�[���y:��'� [h ��pN}(w_,wgLi�K=�1��/��
�a�}������8�2�����MM������u�����\�<��������?�W���`�_��~~���>B���e�����*�����4>}���j������m��_�e�a�'L_�^�����syx�-%�e�:,CF$�D"�X��d	����a��I�J�Xco/J�x���z�����BU�������i�L/���]��Y�Uf1��L=ZF#P�������c*��[�E���>�Q�^B%��;����d�`F��qv�x�����H��5���g�!~g��C����[]?�!���M��6��/���}�I8[M���t.�oC�=���/?�����O{w��|.U�A����Z�>����B~��/<��P����z���'L�p2#{S)��\Ybr�X%�Y*��/���g�Vp���T��7�j��e[�/�C�_
�r��?�.�Z�������m�*�;���Rp�N7�5l���Q�5��[CE�����$�\�M��0M�]�[:
�>��pP�se�<h�6���t����%�������,(K���i�9B=%ocL�;���K�S���RG�8���P���I�k8�
oC#Q��[�xo�}5���,��N4�E6j���Jy���Q�R������qd��dH�%��h�SakE�(�V*-1�k	�d�q(2EC�����Hc�d��C�Ca����[-�A���#
��Y��T$�O�M�"���H�����	���;2��	ru�$a��z?�z�!D�l6�|�3.�iP�{wj3�)@*�avY�[��2!��G��A��� *��`��`�K���H����::%��C^8K��3N���*y���@���4^���v�>j����RR�9@V��4������`K�PH�v�z����/��}�bi���4�r�����-�������*����HjY�e��f��z{sP(����g?��{�`V����|�������t0���v#�z
�M�=B��6�xvjm<�!P@�Jy�x5���%�	��^���t��7�xF)��D���o��s0���W(9�$�w�����0�h��:@
Pax���V�I�G���B��k�s������!r�-/' \�|���dI�a����&'����I�N>G�MR]��dC�6����;:���4���%�R�S�1 ��P����N&�B���f���C��9i��(���As�� ����m0���t�W��dx�&�@�� B��X'v��
�<X��(�>e��jb��Y�i��iX���@7���F�/�1{��y|V9&��
�����l��H�����1���`��{5�-T������I7@�����g�4,
�9�@t������������8�;GH�����������!�E�f�2=�f���@hj2
���V�zg�O���MG9��F���R9��+K+r9Z�0����f�V���ZY��*�u�V�����_^+���VV�CGG3��T3�Z�iA]K+�j��^�#1aD������J���g��v�;��4(AOr���(�x��cLT���l�3�B�I��0Z�WjH����Y����X*$'���R��	�$u�$
�p��h���q�>�!��e�j� C�k�����(��7Je�Vs�f�����c����|~���G�q8z
��oI�)I(�G�/(B�ev�_��cH�L*^�Wo��J�O��������y�<|>��������y�<|>��������y�<|>��������y�<|>��������y�<|>�����a���

#139

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#137)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 22, 2018 at 10:06 AM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Mar 22, 2018 at 3:26 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

Is there a good reason not to use input_rel->relids as the input to
fetch_upper_rel() in all cases, rather than just at subordinate
levels?

That would simplify some code in these patches. We have set
upper_rel->relids to NULL for non-other upper relation since Tom
expected to use relids to mean something other than scan/join relids.
With these patch-sets for grouping rels we are using upper_rel->relids
to the relids of underlying scan/join relation. So it does make sense
to set relids to input_rel->relids for all the grouping rels whether
"other" or non-"other" grouping rels.

But with this change, we have to change all the existing code to pass
input_rel->relids to fetch_upper_rel(). If we don't do that or in
future somebody calls that function with relids = NULL we will produce
two relations which are supposed to do the same thing but have
different relids set. That's because fetch_upper_rel() creates a
relation if one does not exist whether or not the caller intends to
create one. We should probably create two functions 1. to build an
upper relation and 2. to search for it similar to what we have done
for join relations and base relation. The other possibility is to pass
a flag to fetch_upper_rel() indicating whether a caller intends to
create a new relation when one doesn't exist. With this design a
caller can be sure that an upper relation will not be created when it
wants to just fetch an existing relation (and error out/assert if it
doesn't find one.).

Like Ashutosh said, splitting fetch_upper_rel() in two functions,
build_upper_rel() and find_upper_rel() looks better.

However, I am not sure whether setting relids in a top-most grouped rel is
a good idea or not. I remember we need this while working on Aggregate
PushDown, and in [1]/messages/by-id/CAFjFpRdUz6h6cmFZFYAngmQAX8Zvo+MZsPXidZ077h=gp9bvQw@mail.gmail.com Tom Lane opposed the idea of setting the relids in
grouped_rel.

If we want to go with this, then I think it should be done as a separate
stand-alone patch.

[1]: /messages/by-id/CAFjFpRdUz6h6cmFZFYAngmQAX8Zvo+MZsPXidZ077h=gp9bvQw@mail.gmail.com
/messages/by-id/CAFjFpRdUz6h6cmFZFYAngmQAX8Zvo+MZsPXidZ077h=gp9bvQw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#140

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#138)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 22, 2018 at 6:15 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Leeks cleaner now. Thanks for refactoring it.

I have merged these changes and flatten all previuos changes into the main
patch.

Committed 0001-0005. I made a few further modifications. These were
mostly cosmetic, but with two exceptions:

1. I moved one set_cheapest() call to avoid doing that twice for the
top-level grouped_rel.

2. I removed the logic to set partition properties for grouped_rels.
As far as I can see, there's nothing that needs this. It would be
important if we wanted subsequent planning stages to be able to do
partition-wise stuff, e.g. when doing window functions or setops, or
at higher query levels. Maybe we'll have that someday; until then, I
think this is just a waste of cycles.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#141

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Robert Haas (#140)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 22, 2018 at 10:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 22, 2018 at 6:15 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Leeks cleaner now. Thanks for refactoring it.

I have merged these changes and flatten all previuos changes into the

main

patch.

Committed 0001-0005.

Thanks Robert.

I made a few further modifications. These were
mostly cosmetic, but with two exceptions:

1. I moved one set_cheapest() call to avoid doing that twice for the
top-level grouped_rel.

2. I removed the logic to set partition properties for grouped_rels.
As far as I can see, there's nothing that needs this. It would be
important if we wanted subsequent planning stages to be able to do
partition-wise stuff, e.g. when doing window functions or setops, or
at higher query levels. Maybe we'll have that someday; until then, I
think this is just a waste of cycles.

OK.

Changes related to postgres_fdw which allows pushing aggregate on the
foreign server is not yet committed. Due to this, we will end up getting an
error when we have foreign partitions + aggregation.

Attached 0001 patch here (0006 from my earlier patch-set) which adds
support for this and thus will not have any error.

Thanks

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

0001-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patchtext/x-patch; charset=US-ASCII; name=0001-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patchDownload

From 1173c609f6c6b2b45c0d1ce05533b840f7885717 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Date: Fri, 23 Mar 2018 14:23:53 +0530
Subject: [PATCH] Teach postgres_fdw to push aggregates for child relations
 too.

GetForeignUpperPaths() now takes an extra void parameter which will
be used to pass any additional details required to create an upper
path at the remote server. However, we support only grouping over
remote server today and thus it passes grouping specific details i.e.
GroupPathExtradata, NULL otherwise.

Since we don't know how to get a partially aggregated result from a
remote server, only full aggregation is pushed on the remote server.
---
 contrib/postgres_fdw/expected/postgres_fdw.out | 132 +++++++++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.c            |  48 ++++++---
 contrib/postgres_fdw/postgres_fdw.h            |   2 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  51 ++++++++++
 doc/src/sgml/fdwhandler.sgml                   |   8 +-
 src/backend/optimizer/plan/createplan.c        |   7 +-
 src/backend/optimizer/plan/planner.c           |  29 +++---
 src/backend/optimizer/prep/prepunion.c         |   2 +-
 src/include/foreign/fdwapi.h                   |   3 +-
 src/include/optimizer/planner.h                |   3 +-
 10 files changed, 253 insertions(+), 32 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d6e387..a211aa9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7852,3 +7852,135 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 (14 rows)
 
 RESET enable_partitionwise_join;
+-- ===================================================================
+-- test partitionwise aggregates
+-- ===================================================================
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.a
+   ->  HashAggregate
+         Group Key: fpagg_tab_p1.a
+         Filter: (avg(fpagg_tab_p1.b) < '22'::numeric)
+         ->  Append
+               ->  Foreign Scan on fpagg_tab_p1
+               ->  Foreign Scan on fpagg_tab_p2
+               ->  Foreign Scan on fpagg_tab_p3
+(9 rows)
+
+-- Plan with partitionwise aggregates is enabled
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.a
+   ->  Append
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab)
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab)
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab)
+(9 rows)
+
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+ a  | sum  | min | count 
+----+------+-----+-------
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
+ 10 | 2000 |   0 |   100
+ 11 | 2100 |   1 |   100
+ 20 | 2000 |   0 |   100
+ 21 | 2100 |   1 |   100
+(6 rows)
+
+-- Check with whole-row reference
+-- Should have all the columns in the target list for the given relation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Sort
+   Output: t1.a, (count(((t1.*)::pagg_tab)))
+   Sort Key: t1.a
+   ->  Append
+         ->  HashAggregate
+               Output: t1.a, count(((t1.*)::pagg_tab))
+               Group Key: t1.a
+               Filter: (avg(t1.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p1 t1
+                     Output: t1.a, t1.*, t1.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1
+         ->  HashAggregate
+               Output: t1_1.a, count(((t1_1.*)::pagg_tab))
+               Group Key: t1_1.a
+               Filter: (avg(t1_1.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p2 t1_1
+                     Output: t1_1.a, t1_1.*, t1_1.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2
+         ->  HashAggregate
+               Output: t1_2.a, count(((t1_2.*)::pagg_tab))
+               Group Key: t1_2.a
+               Filter: (avg(t1_2.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p3 t1_2
+                     Output: t1_2.a, t1_2.*, t1_2.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3
+(25 rows)
+
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+ a  | count 
+----+-------
+  0 |   100
+  1 |   100
+ 10 |   100
+ 11 |   100
+ 20 |   100
+ 21 |   100
+(6 rows)
+
+-- When GROUP BY clause does not match with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.b
+   ->  Finalize HashAggregate
+         Group Key: fpagg_tab_p1.b
+         Filter: (sum(fpagg_tab_p1.a) < 700)
+         ->  Append
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p1.b
+                     ->  Foreign Scan on fpagg_tab_p1
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p2.b
+                     ->  Foreign Scan on fpagg_tab_p2
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p3.b
+                     ->  Foreign Scan on fpagg_tab_p3
+(15 rows)
+
+-- Clean-up
+RESET enable_partitionwise_aggregate;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e8a0d54..42756aa 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -352,7 +352,8 @@ static bool postgresRecheckForeignScan(ForeignScanState *node,
 static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 UpperRelationKind stage,
 							 RelOptInfo *input_rel,
-							 RelOptInfo *output_rel);
+							 RelOptInfo *output_rel,
+							 void *extra);
 
 /*
  * Helper functions
@@ -427,7 +428,8 @@ static void add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 								Path *epq_path);
 static void add_foreign_grouping_paths(PlannerInfo *root,
 						   RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel);
+						   RelOptInfo *grouped_rel,
+						   GroupPathExtraData *extra);
 static void apply_server_options(PgFdwRelationInfo *fpinfo);
 static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
@@ -2775,7 +2777,7 @@ estimate_path_cost_size(PlannerInfo *root,
 		else if (IS_UPPER_REL(foreignrel))
 		{
 			PgFdwRelationInfo *ofpinfo;
-			PathTarget *ptarget = root->upper_targets[UPPERREL_GROUP_AGG];
+			PathTarget *ptarget = fpinfo->grouped_target;
 			AggClauseCosts aggcosts;
 			double		input_rows;
 			int			numGroupCols;
@@ -2805,7 +2807,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			{
 				get_agg_clause_costs(root, (Node *) fpinfo->grouped_tlist,
 									 AGGSPLIT_SIMPLE, &aggcosts);
-				get_agg_clause_costs(root, (Node *) root->parse->havingQual,
+				get_agg_clause_costs(root, fpinfo->havingQual,
 									 AGGSPLIT_SIMPLE, &aggcosts);
 			}
 
@@ -5020,8 +5022,8 @@ static bool
 foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 {
 	Query	   *query = root->parse;
-	PathTarget *grouping_target = root->upper_targets[UPPERREL_GROUP_AGG];
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) grouped_rel->fdw_private;
+	PathTarget *grouping_target = fpinfo->grouped_target;
 	PgFdwRelationInfo *ofpinfo;
 	List	   *aggvars;
 	ListCell   *lc;
@@ -5131,11 +5133,11 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 	 * Classify the pushable and non-pushable HAVING clauses and save them in
 	 * remote_conds and local_conds of the grouped rel's fpinfo.
 	 */
-	if (root->hasHavingQual && query->havingQual)
+	if (root->hasHavingQual && fpinfo->havingQual)
 	{
 		ListCell   *lc;
 
-		foreach(lc, (List *) query->havingQual)
+		foreach(lc, (List *) fpinfo->havingQual)
 		{
 			Expr	   *expr = (Expr *) lfirst(lc);
 			RestrictInfo *rinfo;
@@ -5232,7 +5234,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
  */
 static void
 postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
-							 RelOptInfo *input_rel, RelOptInfo *output_rel)
+							 RelOptInfo *input_rel, RelOptInfo *output_rel,
+							 void *extra)
 {
 	PgFdwRelationInfo *fpinfo;
 
@@ -5252,7 +5255,8 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 	fpinfo->pushdown_safe = false;
 	output_rel->fdw_private = fpinfo;
 
-	add_foreign_grouping_paths(root, input_rel, output_rel);
+	add_foreign_grouping_paths(root, input_rel, output_rel,
+							   (GroupPathExtraData *) extra);
 }
 
 /*
@@ -5264,13 +5268,13 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
  */
 static void
 add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel)
+						   RelOptInfo *grouped_rel,
+						   GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 	PgFdwRelationInfo *ifpinfo = input_rel->fdw_private;
 	PgFdwRelationInfo *fpinfo = grouped_rel->fdw_private;
 	ForeignPath *grouppath;
-	PathTarget *grouping_target;
 	double		rows;
 	int			width;
 	Cost		startup_cost;
@@ -5281,7 +5285,25 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		!root->hasHavingQual)
 		return;
 
-	grouping_target = root->upper_targets[UPPERREL_GROUP_AGG];
+	/*
+	 * Store passed-in target and havingQual in fpinfo. If its a foreign
+	 * partition, then path target and HAVING quals fetched from the root are
+	 * not correct as Vars within it won't match with this child relation.
+	 * However, server passed them through extra and thus fetch from it.
+	 */
+	if (extra)
+	{
+		/* Partial aggregates are not supported. */
+		Assert(extra->patype != PARTITIONWISE_AGGREGATE_PARTIAL);
+
+		fpinfo->grouped_target = extra->target;
+		fpinfo->havingQual = extra->havingQual;
+	}
+	else
+	{
+		fpinfo->grouped_target = root->upper_targets[UPPERREL_GROUP_AGG];
+		fpinfo->havingQual = parse->havingQual;
+	}
 
 	/* save the input_rel as outerrel in fpinfo */
 	fpinfo->outerrel = input_rel;
@@ -5312,7 +5334,7 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* Create and add foreign path to the grouping relation. */
 	grouppath = create_foreignscan_path(root,
 										grouped_rel,
-										grouping_target,
+										fpinfo->grouped_target,
 										rows,
 										startup_cost,
 										total_cost,
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index d37cc88..0d3c675 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -94,6 +94,8 @@ typedef struct PgFdwRelationInfo
 
 	/* Grouping information */
 	List	   *grouped_tlist;
+	PathTarget *grouped_target;
+	Node	   *havingQual;
 
 	/* Subquery information */
 	bool		make_outerrel_subquery; /* do we deparse outerrel as a
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4d2e43c..cf32be4 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1932,3 +1932,54 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
 
 RESET enable_partitionwise_join;
+
+
+-- ===================================================================
+-- test partitionwise aggregates
+-- ===================================================================
+
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- Plan with partitionwise aggregates is enabled
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- Check with whole-row reference
+-- Should have all the columns in the target list for the given relation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- When GROUP BY clause does not match with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
+
+
+-- Clean-up
+RESET enable_partitionwise_aggregate;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 0ed3a47..25915a2 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -359,7 +359,8 @@ void
 GetForeignUpperPaths(PlannerInfo *root,
                      UpperRelationKind stage,
                      RelOptInfo *input_rel,
-                     RelOptInfo *output_rel);
+                     RelOptInfo *output_rel,
+                     void *extra);
 </programlisting>
      Create possible access paths for <firstterm>upper relation</firstterm> processing,
      which is the planner's term for all post-scan/join query processing, such
@@ -379,7 +380,10 @@ GetForeignUpperPaths(PlannerInfo *root,
      currently being considered.  <literal>output_rel</literal> is the upper relation
      that should receive paths representing computation of this step,
      and <literal>input_rel</literal> is the relation representing the input to this
-     step.  (Note that <structname>ForeignPath</structname> paths added
+     step.  <literal>extra</literal> parameter provides additional details which may
+     be needed for the paths creation, like child details in case of partitioning.
+     The details passed through this parameter is depends on the <literal>stage</literal>
+     parameter. (Note that <structname>ForeignPath</structname> paths added
      to <literal>output_rel</literal> would typically not have any direct dependency
      on paths of the <literal>input_rel</literal>, since their processing is expected
      to be done externally.  However, examining paths previously generated for
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8b4f031..a54cc96 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3517,10 +3517,11 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 
 	/*
 	 * Likewise, copy the relids that are represented by this foreign scan. An
-	 * upper rel doesn't have relids set, but it covers all the base relations
-	 * participating in the underlying scan, so use root's all_baserels.
+	 * upper rel (but not the other upper rel) doesn't have relids set, but it
+	 * covers all the base relations participating in the underlying scan, so
+	 * use root's all_baserels.
 	 */
-	if (IS_UPPER_REL(rel))
+	if (IS_UPPER_REL(rel) && !IS_OTHER_REL(rel))
 		scan_plan->fs_relids = root->all_baserels;
 	else
 		scan_plan->fs_relids = best_path->path.parent->relids;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 50f858e..9642489 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2203,12 +2203,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	if (final_rel->fdwroutine &&
 		final_rel->fdwroutine->GetForeignUpperPaths)
 		final_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_FINAL,
-													current_rel, final_rel);
+													current_rel, final_rel,
+													NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_FINAL,
-									current_rel, final_rel);
+									current_rel, final_rel, NULL);
 
 	/* Note: currently, we leave it to callers to do set_cheapest() */
 }
@@ -4022,12 +4023,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	if (grouped_rel->fdwroutine &&
 		grouped_rel->fdwroutine->GetForeignUpperPaths)
 		grouped_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_GROUP_AGG,
-													  input_rel, grouped_rel);
+													  input_rel, grouped_rel,
+													  extra);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_GROUP_AGG,
-									input_rel, grouped_rel);
+									input_rel, grouped_rel,
+									extra);
 }
 
 /*
@@ -4459,12 +4462,13 @@ create_window_paths(PlannerInfo *root,
 	if (window_rel->fdwroutine &&
 		window_rel->fdwroutine->GetForeignUpperPaths)
 		window_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_WINDOW,
-													 input_rel, window_rel);
+													 input_rel, window_rel,
+													 NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_WINDOW,
-									input_rel, window_rel);
+									input_rel, window_rel, NULL);
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
@@ -4763,12 +4767,13 @@ create_distinct_paths(PlannerInfo *root,
 	if (distinct_rel->fdwroutine &&
 		distinct_rel->fdwroutine->GetForeignUpperPaths)
 		distinct_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_DISTINCT,
-													   input_rel, distinct_rel);
+													   input_rel, distinct_rel,
+													   NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_DISTINCT,
-									input_rel, distinct_rel);
+									input_rel, distinct_rel, NULL);
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
@@ -4906,12 +4911,13 @@ create_ordered_paths(PlannerInfo *root,
 	if (ordered_rel->fdwroutine &&
 		ordered_rel->fdwroutine->GetForeignUpperPaths)
 		ordered_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_ORDERED,
-													  input_rel, ordered_rel);
+													  input_rel, ordered_rel,
+													  NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_ORDERED,
-									input_rel, ordered_rel);
+									input_rel, ordered_rel, NULL);
 
 	/*
 	 * No need to bother with set_cheapest here; grouping_planner does not
@@ -6692,7 +6698,8 @@ create_partial_grouping_paths(PlannerInfo *root,
 
 		fdwroutine->GetForeignUpperPaths(root,
 										 UPPERREL_PARTIAL_GROUP_AGG,
-										 input_rel, partially_grouped_rel);
+										 input_rel, partially_grouped_rel,
+										 extra);
 	}
 
 	return partially_grouped_rel;
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9..5236ab3 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1032,7 +1032,7 @@ postprocess_setop_rel(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_SETOP,
-									NULL, rel);
+									NULL, rel, NULL);
 
 	/* Select cheapest path */
 	set_cheapest(rel);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index e88fee3..ea83c7b 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
 typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
 											   UpperRelationKind stage,
 											   RelOptInfo *input_rel,
-											   RelOptInfo *output_rel);
+											   RelOptInfo *output_rel,
+											   void *extra);
 
 typedef void (*AddForeignUpdateTargets_function) (Query *parsetree,
 												  RangeTblEntry *target_rte,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d..07a3bc0 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -28,7 +28,8 @@ extern PGDLLIMPORT planner_hook_type planner_hook;
 typedef void (*create_upper_paths_hook_type) (PlannerInfo *root,
 											  UpperRelationKind stage,
 											  RelOptInfo *input_rel,
-											  RelOptInfo *output_rel);
+											  RelOptInfo *output_rel,
+											  void *extra);
 extern PGDLLIMPORT create_upper_paths_hook_type create_upper_paths_hook;
 
 
-- 
1.8.3.1

#142

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#141)

1 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi Robert,

On pgsql-committers Andres reported one concern about test case failure
with installcheck with local settings.
(Sorry, I have not subscribed to that mailing list and thus not able to
reply there).

Attached patch which fixes that.

However, I am not sure whether it is expected to have stable regression run
with installcheck having local settings.
For example, If I have enabale_hashagg = false locally; I will definitely
see failures.

ISTM, that I am missing Andres point here.

Thanks

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

Attachments:

fix_pwa_andres_concern.patchtext/x-patch; charset=US-ASCII; name=fix_pwa_andres_concern.patchDownload

diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index bf8272e..76a8209 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -6,6 +6,8 @@
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join TO true;
+-- Disable parallel plans.
+SET max_parallel_workers_per_gather TO 0;
 --
 -- Tests for list partitioned tables.
 --
@@ -921,6 +923,8 @@ ALTER TABLE pagg_tab_ml_p3 ATTACH PARTITION pagg_tab_ml_p3_s1 FOR VALUES FROM (0
 ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO (30);
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM0000') FROM generate_series(0, 29999) i;
 ANALYZE pagg_tab_ml;
+-- For Parallel Append
+SET max_parallel_workers_per_gather TO 2;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -1146,7 +1150,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 (12 rows)
 
 -- Parallelism within partitionwise aggregates
-SET max_parallel_workers_per_gather TO 2;
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index f7b5f5a..c60d7d2 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -7,6 +7,8 @@
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
 SET enable_partitionwise_join TO true;
+-- Disable parallel plans.
+SET max_parallel_workers_per_gather TO 0;
 
 --
 -- Tests for list partitioned tables.
@@ -206,6 +208,9 @@ ALTER TABLE pagg_tab_ml ATTACH PARTITION pagg_tab_ml_p3 FOR VALUES FROM (20) TO
 INSERT INTO pagg_tab_ml SELECT i % 30, i % 10, to_char(i % 4, 'FM0000') FROM generate_series(0, 29999) i;
 ANALYZE pagg_tab_ml;
 
+-- For Parallel Append
+SET max_parallel_workers_per_gather TO 2;
+
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, but still we do not see a partial aggregation as array_agg()
@@ -238,7 +243,6 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
 
 -- Parallelism within partitionwise aggregates
 
-SET max_parallel_workers_per_gather TO 2;
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;

#143

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#141)

Re: [HACKERS] Partition-wise aggregation/grouping

On Fri, Mar 23, 2018 at 4:35 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Changes related to postgres_fdw which allows pushing aggregate on the
foreign server is not yet committed. Due to this, we will end up getting an
error when we have foreign partitions + aggregation.

Attached 0001 patch here (0006 from my earlier patch-set) which adds support
for this and thus will not have any error.

         else if (IS_UPPER_REL(foreignrel))
         {
             PgFdwRelationInfo *ofpinfo;
-            PathTarget *ptarget = root->upper_targets[UPPERREL_GROUP_AGG];
+            PathTarget *ptarget = fpinfo->grouped_target;

I think we need an assert there to make sure that the upper relation is a
grouping relation. That way any future push down will notice it.

-                get_agg_clause_costs(root, (Node *) root->parse->havingQual,
+                get_agg_clause_costs(root, fpinfo->havingQual,
                                      AGGSPLIT_SIMPLE, &aggcosts);
             }
Should we pass agg costs as well through GroupPathExtraData to avoid
calculating it again in this function?

 /*
+    /*
+     * Store passed-in target and havingQual in fpinfo. If its a foreign
+     * partition, then path target and HAVING quals fetched from the root are
+     * not correct as Vars within it won't match with this child relation.
+     * However, server passed them through extra and thus fetch from it.
+     */
+    if (extra)
+    {
+        /* Partial aggregates are not supported. */
+        Assert(extra->patype != PARTITIONWISE_AGGREGATE_PARTIAL);
+
+        fpinfo->grouped_target = extra->target;
+        fpinfo->havingQual = extra->havingQual;
+    }
+    else
+    {
+        fpinfo->grouped_target = root->upper_targets[UPPERREL_GROUP_AGG];
+        fpinfo->havingQual = parse->havingQual;
+    }
I think both these cases, extra should always be present whether a child
relation or a parent relation. Just pick from extra always.

/* Grouping information */
List *grouped_tlist;
+ PathTarget *grouped_target;

We should use the target stored in the grouped rel directly.

+ Node *havingQual;
I am wondering whether we could use remote_conds member for storing this.

     /*
      * Likewise, copy the relids that are represented by this foreign scan. An
-     * upper rel doesn't have relids set, but it covers all the base relations
-     * participating in the underlying scan, so use root's all_baserels.
+     * upper rel (but not the other upper rel) doesn't have relids set, but it
+     * covers all the base relations participating in the underlying scan, so
+     * use root's all_baserels.
      */

This is correct only for "other" grouping relations. We are yet to
decide what to do
for the other upper relations.

-    if (IS_UPPER_REL(rel))
+    if (IS_UPPER_REL(rel) && !IS_OTHER_REL(rel))
I guess, this condition boils down to rel->reloptkind == RELOPT_UPPER_REL. Use
it that way?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#144

Andres Freund

andres@anarazel.de

almost 8 years ago

In reply to: Jeevan Chalke (#142)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi,

On 2018-03-23 17:01:54 +0530, Jeevan Chalke wrote:

Attached patch which fixes that.

Thanks, will push. For the future, I'd be more likely to notice if you
CC me ;)

However, I am not sure whether it is expected to have stable regression run
with installcheck having local settings.
For example, If I have enabale_hashagg = false locally; I will definitely
see failures.

ISTM, that I am missing Andres point here.

I don't think there's a hard and fast rule here. I personally often
during development disable parallelism because it makes some things
harder (you can't easily debug crashes with gdb, benchmarks show larger
variance, ...). There doesn't seem to be an equivalent benefit to
support running e.g. with enabale_hashagg = false.

- Andres

#145

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Andres Freund (#144)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 27, 2018 at 3:33 AM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2018-03-23 17:01:54 +0530, Jeevan Chalke wrote:

Attached patch which fixes that.

Thanks, will push. For the future, I'd be more likely to notice if you
CC me ;)

Sure. Thanks.

However, I am not sure whether it is expected to have stable regression

run

with installcheck having local settings.
For example, If I have enabale_hashagg = false locally; I will definitely
see failures.

ISTM, that I am missing Andres point here.

I don't think there's a hard and fast rule here. I personally often
during development disable parallelism because it makes some things
harder (you can't easily debug crashes with gdb, benchmarks show larger
variance, ...).

Yep.

There doesn't seem to be an equivalent benefit to
support running e.g. with enabale_hashagg = false.

OK.
Noted.

Thanks for the explanation.

- Andres

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#146

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#143)

2 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Mon, Mar 26, 2018 at 5:24 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Fri, Mar 23, 2018 at 4:35 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Changes related to postgres_fdw which allows pushing aggregate on the
foreign server is not yet committed. Due to this, we will end up getting

an

error when we have foreign partitions + aggregation.

Attached 0001 patch here (0006 from my earlier patch-set) which adds

support

for this and thus will not have any error.

I have observed that, target member in GroupPathExtraData is not needed as
we store the target in grouped_rel itself.
Attached 0001 patch to remove that.

else if (IS_UPPER_REL(foreignrel))
{
PgFdwRelationInfo *ofpinfo;
-            PathTarget *ptarget = root->upper_targets[UPPERREL_
GROUP_AGG];
+            PathTarget *ptarget = fpinfo->grouped_target;
I think we need an assert there to make sure that the upper relation is a
grouping relation. That way any future push down will notice it.

I am not sure on what we should Assetrt here. Note that we end-up here only
when doing grouping, and thus I don't think we need any Assert here.
Let me know if I missed anything.

-                get_agg_clause_costs(root, (Node *)
root->parse->havingQual,
+                get_agg_clause_costs(root, fpinfo->havingQual,
AGGSPLIT_SIMPLE, &aggcosts);
}
Should we pass agg costs as well through GroupPathExtraData to avoid
calculating it again in this function?

Adding an extra member in GroupPathExtraData just for FDW does not look
good to me.
But yes, if we do that, then we can save this calculation.
Let me know if its OK to have an extra member for just FDW use, will
prepare a separate patch for that.

/*
+    /*
+     * Store passed-in target and havingQual in fpinfo. If its a foreign
+     * partition, then path target and HAVING quals fetched from the root
are
+     * not correct as Vars within it won't match with this child relation.
+     * However, server passed them through extra and thus fetch from it.
+     */
+    if (extra)
+    {
+        /* Partial aggregates are not supported. */
+        Assert(extra->patype != PARTITIONWISE_AGGREGATE_PARTIAL);
+
+        fpinfo->grouped_target = extra->target;
+        fpinfo->havingQual = extra->havingQual;
+    }
+    else
+    {
+        fpinfo->grouped_target = root->upper_targets[UPPERREL_GROUP_AGG];
+        fpinfo->havingQual = parse->havingQual;
+    }
I think both these cases, extra should always be present whether a child
relation or a parent relation. Just pick from extra always.

Yes.
Done.

/* Grouping information */
List *grouped_tlist;
+ PathTarget *grouped_target;

We should use the target stored in the grouped rel directly.

Yep.

+ Node *havingQual;
I am wondering whether we could use remote_conds member for storing this.

This havingQual is later checked for shippability and classified into
pushable and non-pushable quals and stored in remote_conds and local_conds
respectively.
Storing it directly in remote_conds and then splitting it does not look
good to me.
Also, remote_conds is list of RestrictInfo nodes whereas havingQual is not.
So using that for storing havingQual does not make sense. So better to have
a separate member in PgFdwRelationInfo.

/*
* Likewise, copy the relids that are represented by this foreign
scan. An
-     * upper rel doesn't have relids set, but it covers all the base
relations
-     * participating in the underlying scan, so use root's all_baserels.
+     * upper rel (but not the other upper rel) doesn't have relids set,
but it
+     * covers all the base relations participating in the underlying
scan, so
+     * use root's all_baserels.
*/

This is correct only for "other" grouping relations. We are yet to
decide what to do
for the other upper relations.

I have removed this comment change as existing comments look good after
doing following changes:

-    if (IS_UPPER_REL(rel))
+    if (IS_UPPER_REL(rel) && !IS_OTHER_REL(rel))
I guess, this condition boils down to rel->reloptkind == RELOPT_UPPER_REL.
Use
it that way?

Done.

Attached 0002 for this.

Thanks

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

0001-Remove-target-from-GroupPathExtraData-instead-fetch-.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-target-from-GroupPathExtraData-instead-fetch-.patchDownload

From dabe9c41402cdc8dc4afd3bb01202de654e43bf4 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Date: Tue, 27 Mar 2018 14:23:00 +0530
Subject: [PATCH 1/2] Remove target from GroupPathExtraData, instead fetch it
 from grouped_rel.

---
 src/backend/optimizer/plan/planner.c | 4 +---
 src/include/nodes/relation.h         | 2 --
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 52c21e6..3bd63f3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3731,7 +3731,6 @@ create_grouping_paths(PlannerInfo *root,
 			flags |= GROUPING_CAN_PARTIAL_AGG;
 
 		extra.flags = flags;
-		extra.target = target;
 		extra.target_parallel_safe = target_parallel_safe;
 		extra.havingQual = parse->havingQual;
 		extra.targetList = parse->targetList;
@@ -6906,7 +6905,7 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	int			cnt_parts;
 	List	   *grouped_live_children = NIL;
 	List	   *partially_grouped_live_children = NIL;
-	PathTarget *target = extra->target;
+	PathTarget *target = grouped_rel->reltarget;
 
 	Assert(patype != PARTITIONWISE_AGGREGATE_NONE);
 	Assert(patype != PARTITIONWISE_AGGREGATE_PARTIAL ||
@@ -6940,7 +6939,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			adjust_appendrel_attrs(root,
 								   (Node *) target->exprs,
 								   nappinfos, appinfos);
-		child_extra.target = child_target;
 
 		/* Translate havingQual and targetList. */
 		child_extra.havingQual = (Node *)
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9..2b4f773 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -2340,7 +2340,6 @@ typedef enum
  * 		have been initialized.
  * agg_partial_costs gives partial aggregation costs.
  * agg_final_costs gives finalization costs.
- * target is the PathTarget to be used while creating paths.
  * target_parallel_safe is true if target is parallel safe.
  * havingQual gives list of quals to be applied after aggregation.
  * targetList gives list of columns to be projected.
@@ -2355,7 +2354,6 @@ typedef struct
 	AggClauseCosts agg_final_costs;
 
 	/* Data which may differ across partitions. */
-	PathTarget *target;
 	bool		target_parallel_safe;
 	Node	   *havingQual;
 	List	   *targetList;
-- 
1.8.3.1

0002-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patchtext/x-patch; charset=US-ASCII; name=0002-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patchDownload

From 00f1f87d48ff1aebeb6e16305375faaa96b34684 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Date: Tue, 27 Mar 2018 14:23:34 +0530
Subject: [PATCH 2/2] Teach postgres_fdw to push aggregates for child relations
 too.

GetForeignUpperPaths() now takes an extra void parameter which will
be used to pass any additional details required to create an upper
path at the remote server. However, we support only grouping over
remote server today and thus it passes grouping specific details i.e.
GroupPathExtraData, NULL otherwise.

Since we don't know how to get a partially aggregated result from a
remote server, only full aggregation is pushed on the remote server.
---
 contrib/postgres_fdw/expected/postgres_fdw.out | 132 +++++++++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.c            |  37 ++++---
 contrib/postgres_fdw/postgres_fdw.h            |   1 +
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  51 ++++++++++
 doc/src/sgml/fdwhandler.sgml                   |   8 +-
 src/backend/optimizer/plan/createplan.c        |   2 +-
 src/backend/optimizer/plan/planner.c           |  29 +++---
 src/backend/optimizer/prep/prepunion.c         |   2 +-
 src/include/foreign/fdwapi.h                   |   3 +-
 src/include/optimizer/planner.h                |   3 +-
 10 files changed, 238 insertions(+), 30 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d6e387..a211aa9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7852,3 +7852,135 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 (14 rows)
 
 RESET enable_partitionwise_join;
+-- ===================================================================
+-- test partitionwise aggregates
+-- ===================================================================
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.a
+   ->  HashAggregate
+         Group Key: fpagg_tab_p1.a
+         Filter: (avg(fpagg_tab_p1.b) < '22'::numeric)
+         ->  Append
+               ->  Foreign Scan on fpagg_tab_p1
+               ->  Foreign Scan on fpagg_tab_p2
+               ->  Foreign Scan on fpagg_tab_p3
+(9 rows)
+
+-- Plan with partitionwise aggregates is enabled
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.a
+   ->  Append
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab)
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab)
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab)
+(9 rows)
+
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+ a  | sum  | min | count 
+----+------+-----+-------
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
+ 10 | 2000 |   0 |   100
+ 11 | 2100 |   1 |   100
+ 20 | 2000 |   0 |   100
+ 21 | 2100 |   1 |   100
+(6 rows)
+
+-- Check with whole-row reference
+-- Should have all the columns in the target list for the given relation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Sort
+   Output: t1.a, (count(((t1.*)::pagg_tab)))
+   Sort Key: t1.a
+   ->  Append
+         ->  HashAggregate
+               Output: t1.a, count(((t1.*)::pagg_tab))
+               Group Key: t1.a
+               Filter: (avg(t1.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p1 t1
+                     Output: t1.a, t1.*, t1.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1
+         ->  HashAggregate
+               Output: t1_1.a, count(((t1_1.*)::pagg_tab))
+               Group Key: t1_1.a
+               Filter: (avg(t1_1.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p2 t1_1
+                     Output: t1_1.a, t1_1.*, t1_1.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2
+         ->  HashAggregate
+               Output: t1_2.a, count(((t1_2.*)::pagg_tab))
+               Group Key: t1_2.a
+               Filter: (avg(t1_2.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p3 t1_2
+                     Output: t1_2.a, t1_2.*, t1_2.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3
+(25 rows)
+
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+ a  | count 
+----+-------
+  0 |   100
+  1 |   100
+ 10 |   100
+ 11 |   100
+ 20 |   100
+ 21 |   100
+(6 rows)
+
+-- When GROUP BY clause does not match with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.b
+   ->  Finalize HashAggregate
+         Group Key: fpagg_tab_p1.b
+         Filter: (sum(fpagg_tab_p1.a) < 700)
+         ->  Append
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p1.b
+                     ->  Foreign Scan on fpagg_tab_p1
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p2.b
+                     ->  Foreign Scan on fpagg_tab_p2
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p3.b
+                     ->  Foreign Scan on fpagg_tab_p3
+(15 rows)
+
+-- Clean-up
+RESET enable_partitionwise_aggregate;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e8a0d54..ec4416b 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -352,7 +352,8 @@ static bool postgresRecheckForeignScan(ForeignScanState *node,
 static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 UpperRelationKind stage,
 							 RelOptInfo *input_rel,
-							 RelOptInfo *output_rel);
+							 RelOptInfo *output_rel,
+							 void *extra);
 
 /*
  * Helper functions
@@ -427,7 +428,8 @@ static void add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 								Path *epq_path);
 static void add_foreign_grouping_paths(PlannerInfo *root,
 						   RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel);
+						   RelOptInfo *grouped_rel,
+						   GroupPathExtraData *extra);
 static void apply_server_options(PgFdwRelationInfo *fpinfo);
 static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
@@ -2775,7 +2777,7 @@ estimate_path_cost_size(PlannerInfo *root,
 		else if (IS_UPPER_REL(foreignrel))
 		{
 			PgFdwRelationInfo *ofpinfo;
-			PathTarget *ptarget = root->upper_targets[UPPERREL_GROUP_AGG];
+			PathTarget *ptarget = foreignrel->reltarget;
 			AggClauseCosts aggcosts;
 			double		input_rows;
 			int			numGroupCols;
@@ -2805,7 +2807,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			{
 				get_agg_clause_costs(root, (Node *) fpinfo->grouped_tlist,
 									 AGGSPLIT_SIMPLE, &aggcosts);
-				get_agg_clause_costs(root, (Node *) root->parse->havingQual,
+				get_agg_clause_costs(root, fpinfo->havingQual,
 									 AGGSPLIT_SIMPLE, &aggcosts);
 			}
 
@@ -5020,8 +5022,8 @@ static bool
 foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 {
 	Query	   *query = root->parse;
-	PathTarget *grouping_target = root->upper_targets[UPPERREL_GROUP_AGG];
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) grouped_rel->fdw_private;
+	PathTarget *grouping_target = grouped_rel->reltarget;
 	PgFdwRelationInfo *ofpinfo;
 	List	   *aggvars;
 	ListCell   *lc;
@@ -5131,11 +5133,11 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 	 * Classify the pushable and non-pushable HAVING clauses and save them in
 	 * remote_conds and local_conds of the grouped rel's fpinfo.
 	 */
-	if (root->hasHavingQual && query->havingQual)
+	if (root->hasHavingQual && fpinfo->havingQual)
 	{
 		ListCell   *lc;
 
-		foreach(lc, (List *) query->havingQual)
+		foreach(lc, (List *) fpinfo->havingQual)
 		{
 			Expr	   *expr = (Expr *) lfirst(lc);
 			RestrictInfo *rinfo;
@@ -5232,7 +5234,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
  */
 static void
 postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
-							 RelOptInfo *input_rel, RelOptInfo *output_rel)
+							 RelOptInfo *input_rel, RelOptInfo *output_rel,
+							 void *extra)
 {
 	PgFdwRelationInfo *fpinfo;
 
@@ -5252,7 +5255,8 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 	fpinfo->pushdown_safe = false;
 	output_rel->fdw_private = fpinfo;
 
-	add_foreign_grouping_paths(root, input_rel, output_rel);
+	add_foreign_grouping_paths(root, input_rel, output_rel,
+							   (GroupPathExtraData *) extra);
 }
 
 /*
@@ -5264,13 +5268,13 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
  */
 static void
 add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel)
+						   RelOptInfo *grouped_rel,
+						   GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 	PgFdwRelationInfo *ifpinfo = input_rel->fdw_private;
 	PgFdwRelationInfo *fpinfo = grouped_rel->fdw_private;
 	ForeignPath *grouppath;
-	PathTarget *grouping_target;
 	double		rows;
 	int			width;
 	Cost		startup_cost;
@@ -5281,7 +5285,14 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		!root->hasHavingQual)
 		return;
 
-	grouping_target = root->upper_targets[UPPERREL_GROUP_AGG];
+	Assert(extra->patype == PARTITIONWISE_AGGREGATE_NONE ||
+		   extra->patype == PARTITIONWISE_AGGREGATE_FULL);
+
+	/*
+	 * Get HAVING qual from extra. In case of child partition, it will have
+	 * translated Vars.
+	 */
+	fpinfo->havingQual = extra->havingQual;
 
 	/* save the input_rel as outerrel in fpinfo */
 	fpinfo->outerrel = input_rel;
@@ -5312,7 +5323,7 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* Create and add foreign path to the grouping relation. */
 	grouppath = create_foreignscan_path(root,
 										grouped_rel,
-										grouping_target,
+										grouped_rel->reltarget,
 										rows,
 										startup_cost,
 										total_cost,
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index d37cc88..488c538 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -94,6 +94,7 @@ typedef struct PgFdwRelationInfo
 
 	/* Grouping information */
 	List	   *grouped_tlist;
+	Node	   *havingQual;
 
 	/* Subquery information */
 	bool		make_outerrel_subquery; /* do we deparse outerrel as a
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4d2e43c..cf32be4 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1932,3 +1932,54 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
 
 RESET enable_partitionwise_join;
+
+
+-- ===================================================================
+-- test partitionwise aggregates
+-- ===================================================================
+
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- Plan with partitionwise aggregates is enabled
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- Check with whole-row reference
+-- Should have all the columns in the target list for the given relation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- When GROUP BY clause does not match with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
+
+
+-- Clean-up
+RESET enable_partitionwise_aggregate;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 0ed3a47..25915a2 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -359,7 +359,8 @@ void
 GetForeignUpperPaths(PlannerInfo *root,
                      UpperRelationKind stage,
                      RelOptInfo *input_rel,
-                     RelOptInfo *output_rel);
+                     RelOptInfo *output_rel,
+                     void *extra);
 </programlisting>
      Create possible access paths for <firstterm>upper relation</firstterm> processing,
      which is the planner's term for all post-scan/join query processing, such
@@ -379,7 +380,10 @@ GetForeignUpperPaths(PlannerInfo *root,
      currently being considered.  <literal>output_rel</literal> is the upper relation
      that should receive paths representing computation of this step,
      and <literal>input_rel</literal> is the relation representing the input to this
-     step.  (Note that <structname>ForeignPath</structname> paths added
+     step.  <literal>extra</literal> parameter provides additional details which may
+     be needed for the paths creation, like child details in case of partitioning.
+     The details passed through this parameter is depends on the <literal>stage</literal>
+     parameter. (Note that <structname>ForeignPath</structname> paths added
      to <literal>output_rel</literal> would typically not have any direct dependency
      on paths of the <literal>input_rel</literal>, since their processing is expected
      to be done externally.  However, examining paths previously generated for
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8b4f031..8bd15c9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3520,7 +3520,7 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * upper rel doesn't have relids set, but it covers all the base relations
 	 * participating in the underlying scan, so use root's all_baserels.
 	 */
-	if (IS_UPPER_REL(rel))
+	if (rel->reloptkind == RELOPT_UPPER_REL)
 		scan_plan->fs_relids = root->all_baserels;
 	else
 		scan_plan->fs_relids = best_path->path.parent->relids;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3bd63f3..30d6c28 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2205,12 +2205,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	if (final_rel->fdwroutine &&
 		final_rel->fdwroutine->GetForeignUpperPaths)
 		final_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_FINAL,
-													current_rel, final_rel);
+													current_rel, final_rel,
+													NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_FINAL,
-									current_rel, final_rel);
+									current_rel, final_rel, NULL);
 
 	/* Note: currently, we leave it to callers to do set_cheapest() */
 }
@@ -4023,12 +4024,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	if (grouped_rel->fdwroutine &&
 		grouped_rel->fdwroutine->GetForeignUpperPaths)
 		grouped_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_GROUP_AGG,
-													  input_rel, grouped_rel);
+													  input_rel, grouped_rel,
+													  extra);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_GROUP_AGG,
-									input_rel, grouped_rel);
+									input_rel, grouped_rel,
+									extra);
 }
 
 /*
@@ -4460,12 +4463,13 @@ create_window_paths(PlannerInfo *root,
 	if (window_rel->fdwroutine &&
 		window_rel->fdwroutine->GetForeignUpperPaths)
 		window_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_WINDOW,
-													 input_rel, window_rel);
+													 input_rel, window_rel,
+													 NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_WINDOW,
-									input_rel, window_rel);
+									input_rel, window_rel, NULL);
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
@@ -4764,12 +4768,13 @@ create_distinct_paths(PlannerInfo *root,
 	if (distinct_rel->fdwroutine &&
 		distinct_rel->fdwroutine->GetForeignUpperPaths)
 		distinct_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_DISTINCT,
-													   input_rel, distinct_rel);
+													   input_rel, distinct_rel,
+													   NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_DISTINCT,
-									input_rel, distinct_rel);
+									input_rel, distinct_rel, NULL);
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
@@ -4907,12 +4912,13 @@ create_ordered_paths(PlannerInfo *root,
 	if (ordered_rel->fdwroutine &&
 		ordered_rel->fdwroutine->GetForeignUpperPaths)
 		ordered_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_ORDERED,
-													  input_rel, ordered_rel);
+													  input_rel, ordered_rel,
+													  NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_ORDERED,
-									input_rel, ordered_rel);
+									input_rel, ordered_rel, NULL);
 
 	/*
 	 * No need to bother with set_cheapest here; grouping_planner does not
@@ -6693,7 +6699,8 @@ create_partial_grouping_paths(PlannerInfo *root,
 
 		fdwroutine->GetForeignUpperPaths(root,
 										 UPPERREL_PARTIAL_GROUP_AGG,
-										 input_rel, partially_grouped_rel);
+										 input_rel, partially_grouped_rel,
+										 extra);
 	}
 
 	return partially_grouped_rel;
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9..5236ab3 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1032,7 +1032,7 @@ postprocess_setop_rel(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_SETOP,
-									NULL, rel);
+									NULL, rel, NULL);
 
 	/* Select cheapest path */
 	set_cheapest(rel);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index e88fee3..ea83c7b 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
 typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
 											   UpperRelationKind stage,
 											   RelOptInfo *input_rel,
-											   RelOptInfo *output_rel);
+											   RelOptInfo *output_rel,
+											   void *extra);
 
 typedef void (*AddForeignUpdateTargets_function) (Query *parsetree,
 												  RangeTblEntry *target_rte,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d..07a3bc0 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -28,7 +28,8 @@ extern PGDLLIMPORT planner_hook_type planner_hook;
 typedef void (*create_upper_paths_hook_type) (PlannerInfo *root,
 											  UpperRelationKind stage,
 											  RelOptInfo *input_rel,
-											  RelOptInfo *output_rel);
+											  RelOptInfo *output_rel,
+											  void *extra);
 extern PGDLLIMPORT create_upper_paths_hook_type create_upper_paths_hook;
 
 
-- 
1.8.3.1

#147

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Jeevan Chalke (#146)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Mar 27, 2018 at 2:43 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

else if (IS_UPPER_REL(foreignrel))
{
PgFdwRelationInfo *ofpinfo;
-            PathTarget *ptarget =
root->upper_targets[UPPERREL_GROUP_AGG];
+            PathTarget *ptarget = fpinfo->grouped_target;
I think we need an assert there to make sure that the upper relation is a
grouping relation. That way any future push down will notice it.
I am not sure on what we should Assetrt here. Note that we end-up here only
when doing grouping, and thus I don't think we need any Assert here.
Let me know if I missed anything.

Since we are just checking whether it's an upper relation and directly
using root->upper_targets[UPPERREL_GROUP_AGG], I thought we could add
an assert to verify that it's really the grouping rel we are dealing
with. But I guess, we can't really check that from given relation. But
then for a grouped rel we can get its target from RelOptInfo. So, we
shouldn't need to root->upper_targets[UPPERREL_GROUP_AGG]. Am I
missing something? For other upper relations we do not set the target
yet but then we could assert that there exists one in the grouped
relation.

-                get_agg_clause_costs(root, (Node *)
root->parse->havingQual,
+                get_agg_clause_costs(root, fpinfo->havingQual,
AGGSPLIT_SIMPLE, &aggcosts);
}
Should we pass agg costs as well through GroupPathExtraData to avoid
calculating it again in this function?
Adding an extra member in GroupPathExtraData just for FDW does not look good
to me.
But yes, if we do that, then we can save this calculation.
Let me know if its OK to have an extra member for just FDW use, will prepare
a separate patch for that.

I think that should be fine. A separate patch would be good, so that a
committer can decide whether or not to include it.

+ Node *havingQual;
I am wondering whether we could use remote_conds member for storing this.

This havingQual is later checked for shippability and classified into
pushable and non-pushable quals and stored in remote_conds and local_conds
respectively.
Storing it directly in remote_conds and then splitting it does not look good
to me.
Also, remote_conds is list of RestrictInfo nodes whereas havingQual is not.
So using that for storing havingQual does not make sense. So better to have
a separate member in PgFdwRelationInfo.

Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
are basically the conditions on the relation being pushed down.
havingQuals are conditions on a grouped relation so treating them like
baserestrictinfo or join conditions looks more straight forward,
rather than having a separate member in PgFdwRelationInfo. So, remote
havingQuals go into remote_conds and local havingQuals go to
local_conds.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#148

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#147)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 28, 2018 at 7:21 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
are basically the conditions on the relation being pushed down.
havingQuals are conditions on a grouped relation so treating them like
baserestrictinfo or join conditions looks more straight forward,
rather than having a separate member in PgFdwRelationInfo. So, remote
havingQuals go into remote_conds and local havingQuals go to
local_conds.

Looks like we already do that. Then we have remote_conds, local_conds
which together should be equivalent to havingQual. Storing all those
three doesn't make sense. In future someone may use havingQual instead
of remote_conds/local_conds just because its available and then there
is risk of these three lists going out of sync.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#149

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#147)

3 attachment(s)

Re: [HACKERS] Partition-wise aggregation/grouping

On Wed, Mar 28, 2018 at 7:21 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Tue, Mar 27, 2018 at 2:43 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

I am not sure on what we should Assetrt here. Note that we end-up here

only

when doing grouping, and thus I don't think we need any Assert here.
Let me know if I missed anything.

Since we are just checking whether it's an upper relation and directly
using root->upper_targets[UPPERREL_GROUP_AGG], I thought we could add
an assert to verify that it's really the grouping rel we are dealing
with. But I guess, we can't really check that from given relation. But
then for a grouped rel we can get its target from RelOptInfo. So, we
shouldn't need to root->upper_targets[UPPERREL_GROUP_AGG]. Am I
missing something? For other upper relations we do not set the target
yet but then we could assert that there exists one in the grouped
relation.

Yes. We fetch target from the grouped_rel itself.
Added Assert() per out off-list discussion.

-                get_agg_clause_costs(root, (Node *)
root->parse->havingQual,
+                get_agg_clause_costs(root, fpinfo->havingQual,
AGGSPLIT_SIMPLE, &aggcosts);
}
Should we pass agg costs as well through GroupPathExtraData to avoid
calculating it again in this function?
Adding an extra member in GroupPathExtraData just for FDW does not look
good

to me.
But yes, if we do that, then we can save this calculation.
Let me know if its OK to have an extra member for just FDW use, will

prepare

a separate patch for that.

I think that should be fine. A separate patch would be good, so that a
committer can decide whether or not to include it.

Attached patch 0003 for this.

+ Node *havingQual;
I am wondering whether we could use remote_conds member for storing

this.

This havingQual is later checked for shippability and classified into
pushable and non-pushable quals and stored in remote_conds and

local_conds

respectively.
Storing it directly in remote_conds and then splitting it does not look

good

to me.
Also, remote_conds is list of RestrictInfo nodes whereas havingQual is

not.

So using that for storing havingQual does not make sense. So better to

have

a separate member in PgFdwRelationInfo.

Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
are basically the conditions on the relation being pushed down.
havingQuals are conditions on a grouped relation so treating them like
baserestrictinfo or join conditions looks more straight forward,
rather than having a separate member in PgFdwRelationInfo. So, remote
havingQuals go into remote_conds and local havingQuals go to
local_conds.

OK. Agree.
In this version, I have not added anything in PgFdwRelationInfo.
Having qual is needed at two places; (1) in foreign_grouping_ok() to check
shippability, so passed this translated HAVING qual as a parameter to it,
and (2) estimating aggregates costs in estimate_path_cost_size(); there we
can use havingQual from root itself as costs won't change for parent and
child.
Thus no need of storing a havingQual in PgFdwRelationInfo.

Thanks

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

0001-Remove-target-from-GroupPathExtraData-instead-fetch-.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-target-from-GroupPathExtraData-instead-fetch-.patchDownload

From 761d46b1b255a7521529bbe7d1b128c9d79d6b4e Mon Sep 17 00:00:00 2001
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Date: Tue, 27 Mar 2018 14:23:00 +0530
Subject: [PATCH 1/3] Remove target from GroupPathExtraData, instead fetch it
 from grouped_rel.

---
 src/backend/optimizer/plan/planner.c | 4 +---
 src/include/nodes/relation.h         | 2 --
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a19f5d0..d4acde6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3734,7 +3734,6 @@ create_grouping_paths(PlannerInfo *root,
 			flags |= GROUPING_CAN_PARTIAL_AGG;
 
 		extra.flags = flags;
-		extra.target = target;
 		extra.target_parallel_safe = target_parallel_safe;
 		extra.havingQual = parse->havingQual;
 		extra.targetList = parse->targetList;
@@ -6909,7 +6908,7 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	int			cnt_parts;
 	List	   *grouped_live_children = NIL;
 	List	   *partially_grouped_live_children = NIL;
-	PathTarget *target = extra->target;
+	PathTarget *target = grouped_rel->reltarget;
 
 	Assert(patype != PARTITIONWISE_AGGREGATE_NONE);
 	Assert(patype != PARTITIONWISE_AGGREGATE_PARTIAL ||
@@ -6943,7 +6942,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			adjust_appendrel_attrs(root,
 								   (Node *) target->exprs,
 								   nappinfos, appinfos);
-		child_extra.target = child_target;
 
 		/* Translate havingQual and targetList. */
 		child_extra.havingQual = (Node *)
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9..2b4f773 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -2340,7 +2340,6 @@ typedef enum
  * 		have been initialized.
  * agg_partial_costs gives partial aggregation costs.
  * agg_final_costs gives finalization costs.
- * target is the PathTarget to be used while creating paths.
  * target_parallel_safe is true if target is parallel safe.
  * havingQual gives list of quals to be applied after aggregation.
  * targetList gives list of columns to be projected.
@@ -2355,7 +2354,6 @@ typedef struct
 	AggClauseCosts agg_final_costs;
 
 	/* Data which may differ across partitions. */
-	PathTarget *target;
 	bool		target_parallel_safe;
 	Node	   *havingQual;
 	List	   *targetList;
-- 
1.8.3.1

0002-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patchtext/x-patch; charset=US-ASCII; name=0002-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patchDownload

From bdf615e1d8d06fc4b81ce80f538d61725ce04235 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Date: Tue, 27 Mar 2018 14:23:34 +0530
Subject: [PATCH 2/3] Teach postgres_fdw to push aggregates for child relations
 too.

GetForeignUpperPaths() now takes an extra void parameter which will
be used to pass any additional details required to create an upper
path at the remote server. However, we support only grouping over
remote server today and thus it passes grouping specific details i.e.
GroupPathExtraData, NULL otherwise.

Since we don't know how to get a partially aggregated result from a
remote server, only full aggregation is pushed on the remote server.
---
 contrib/postgres_fdw/expected/postgres_fdw.out | 132 +++++++++++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.c            |  57 ++++++++---
 contrib/postgres_fdw/sql/postgres_fdw.sql      |  51 ++++++++++
 doc/src/sgml/fdwhandler.sgml                   |   8 +-
 src/backend/optimizer/plan/createplan.c        |   2 +-
 src/backend/optimizer/plan/planner.c           |  29 +++---
 src/backend/optimizer/prep/prepunion.c         |   2 +-
 src/include/foreign/fdwapi.h                   |   3 +-
 src/include/optimizer/planner.h                |   3 +-
 9 files changed, 254 insertions(+), 33 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d6e387..a211aa9 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -7852,3 +7852,135 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 (14 rows)
 
 RESET enable_partitionwise_join;
+-- ===================================================================
+-- test partitionwise aggregates
+-- ===================================================================
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                      QUERY PLAN                       
+-------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.a
+   ->  HashAggregate
+         Group Key: fpagg_tab_p1.a
+         Filter: (avg(fpagg_tab_p1.b) < '22'::numeric)
+         ->  Append
+               ->  Foreign Scan on fpagg_tab_p1
+               ->  Foreign Scan on fpagg_tab_p2
+               ->  Foreign Scan on fpagg_tab_p3
+(9 rows)
+
+-- Plan with partitionwise aggregates is enabled
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                              QUERY PLAN                              
+----------------------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.a
+   ->  Append
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p1 pagg_tab)
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p2 pagg_tab)
+         ->  Foreign Scan
+               Relations: Aggregate on (public.fpagg_tab_p3 pagg_tab)
+(9 rows)
+
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+ a  | sum  | min | count 
+----+------+-----+-------
+  0 | 2000 |   0 |   100
+  1 | 2100 |   1 |   100
+ 10 | 2000 |   0 |   100
+ 11 | 2100 |   1 |   100
+ 20 | 2000 |   0 |   100
+ 21 | 2100 |   1 |   100
+(6 rows)
+
+-- Check with whole-row reference
+-- Should have all the columns in the target list for the given relation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Sort
+   Output: t1.a, (count(((t1.*)::pagg_tab)))
+   Sort Key: t1.a
+   ->  Append
+         ->  HashAggregate
+               Output: t1.a, count(((t1.*)::pagg_tab))
+               Group Key: t1.a
+               Filter: (avg(t1.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p1 t1
+                     Output: t1.a, t1.*, t1.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p1
+         ->  HashAggregate
+               Output: t1_1.a, count(((t1_1.*)::pagg_tab))
+               Group Key: t1_1.a
+               Filter: (avg(t1_1.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p2 t1_1
+                     Output: t1_1.a, t1_1.*, t1_1.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p2
+         ->  HashAggregate
+               Output: t1_2.a, count(((t1_2.*)::pagg_tab))
+               Group Key: t1_2.a
+               Filter: (avg(t1_2.b) < '22'::numeric)
+               ->  Foreign Scan on public.fpagg_tab_p3 t1_2
+                     Output: t1_2.a, t1_2.*, t1_2.b
+                     Remote SQL: SELECT a, b, c FROM public.pagg_tab_p3
+(25 rows)
+
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+ a  | count 
+----+-------
+  0 |   100
+  1 |   100
+ 10 |   100
+ 11 |   100
+ 20 |   100
+ 21 |   100
+(6 rows)
+
+-- When GROUP BY clause does not match with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Sort
+   Sort Key: fpagg_tab_p1.b
+   ->  Finalize HashAggregate
+         Group Key: fpagg_tab_p1.b
+         Filter: (sum(fpagg_tab_p1.a) < 700)
+         ->  Append
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p1.b
+                     ->  Foreign Scan on fpagg_tab_p1
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p2.b
+                     ->  Foreign Scan on fpagg_tab_p2
+               ->  Partial HashAggregate
+                     Group Key: fpagg_tab_p3.b
+                     ->  Foreign Scan on fpagg_tab_p3
+(15 rows)
+
+-- Clean-up
+RESET enable_partitionwise_aggregate;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e8a0d54..dbebbda 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -352,7 +352,8 @@ static bool postgresRecheckForeignScan(ForeignScanState *node,
 static void postgresGetForeignUpperPaths(PlannerInfo *root,
 							 UpperRelationKind stage,
 							 RelOptInfo *input_rel,
-							 RelOptInfo *output_rel);
+							 RelOptInfo *output_rel,
+							 void *extra);
 
 /*
  * Helper functions
@@ -419,7 +420,8 @@ static void conversion_error_callback(void *arg);
 static bool foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel,
 				JoinType jointype, RelOptInfo *outerrel, RelOptInfo *innerrel,
 				JoinPathExtraData *extra);
-static bool foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel);
+static bool foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
+					Node *havingQual);
 static List *get_useful_pathkeys_for_relation(PlannerInfo *root,
 								 RelOptInfo *rel);
 static List *get_useful_ecs_for_relation(PlannerInfo *root, RelOptInfo *rel);
@@ -427,7 +429,8 @@ static void add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 								Path *epq_path);
 static void add_foreign_grouping_paths(PlannerInfo *root,
 						   RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel);
+						   RelOptInfo *grouped_rel,
+						   GroupPathExtraData *extra);
 static void apply_server_options(PgFdwRelationInfo *fpinfo);
 static void apply_table_options(PgFdwRelationInfo *fpinfo);
 static void merge_fdw_options(PgFdwRelationInfo *fpinfo,
@@ -2775,13 +2778,19 @@ estimate_path_cost_size(PlannerInfo *root,
 		else if (IS_UPPER_REL(foreignrel))
 		{
 			PgFdwRelationInfo *ofpinfo;
-			PathTarget *ptarget = root->upper_targets[UPPERREL_GROUP_AGG];
+			PathTarget *ptarget = foreignrel->reltarget;
 			AggClauseCosts aggcosts;
 			double		input_rows;
 			int			numGroupCols;
 			double		numGroups = 1;
 
 			/*
+			 * For grouping and/or aggregation, we do set path target in
+			 * grouped_rel's reltarget.  Thus we must have it.
+			 */
+			Assert(ptarget);
+
+			/*
 			 * This cost model is mixture of costing done for sorted and
 			 * hashed aggregates in cost_agg().  We are not sure which
 			 * strategy will be considered at remote side, thus for
@@ -2805,6 +2814,13 @@ estimate_path_cost_size(PlannerInfo *root,
 			{
 				get_agg_clause_costs(root, (Node *) fpinfo->grouped_tlist,
 									 AGGSPLIT_SIMPLE, &aggcosts);
+
+				/*
+				 * Cost of aggregates within the HAVING qual will remain same
+				 * for both parent and a child. Thus, in case of child upper
+				 * rel, it is not necessary to consider translated HAVING qual
+				 * here. Hence use having qual from the root itself.
+				 */
 				get_agg_clause_costs(root, (Node *) root->parse->havingQual,
 									 AGGSPLIT_SIMPLE, &aggcosts);
 			}
@@ -5017,11 +5033,12 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
  * this function to PgFdwRelationInfo of the input relation.
  */
 static bool
-foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
+foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
+					Node *havingQual)
 {
 	Query	   *query = root->parse;
-	PathTarget *grouping_target = root->upper_targets[UPPERREL_GROUP_AGG];
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) grouped_rel->fdw_private;
+	PathTarget *grouping_target = grouped_rel->reltarget;
 	PgFdwRelationInfo *ofpinfo;
 	List	   *aggvars;
 	ListCell   *lc;
@@ -5131,11 +5148,11 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
 	 * Classify the pushable and non-pushable HAVING clauses and save them in
 	 * remote_conds and local_conds of the grouped rel's fpinfo.
 	 */
-	if (root->hasHavingQual && query->havingQual)
+	if (havingQual)
 	{
 		ListCell   *lc;
 
-		foreach(lc, (List *) query->havingQual)
+		foreach(lc, (List *) havingQual)
 		{
 			Expr	   *expr = (Expr *) lfirst(lc);
 			RestrictInfo *rinfo;
@@ -5232,7 +5249,8 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel)
  */
 static void
 postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
-							 RelOptInfo *input_rel, RelOptInfo *output_rel)
+							 RelOptInfo *input_rel, RelOptInfo *output_rel,
+							 void *extra)
 {
 	PgFdwRelationInfo *fpinfo;
 
@@ -5252,7 +5270,8 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
 	fpinfo->pushdown_safe = false;
 	output_rel->fdw_private = fpinfo;
 
-	add_foreign_grouping_paths(root, input_rel, output_rel);
+	add_foreign_grouping_paths(root, input_rel, output_rel,
+							   (GroupPathExtraData *) extra);
 }
 
 /*
@@ -5264,13 +5283,13 @@ postgresGetForeignUpperPaths(PlannerInfo *root, UpperRelationKind stage,
  */
 static void
 add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-						   RelOptInfo *grouped_rel)
+						   RelOptInfo *grouped_rel,
+						   GroupPathExtraData *extra)
 {
 	Query	   *parse = root->parse;
 	PgFdwRelationInfo *ifpinfo = input_rel->fdw_private;
 	PgFdwRelationInfo *fpinfo = grouped_rel->fdw_private;
 	ForeignPath *grouppath;
-	PathTarget *grouping_target;
 	double		rows;
 	int			width;
 	Cost		startup_cost;
@@ -5281,7 +5300,8 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		!root->hasHavingQual)
 		return;
 
-	grouping_target = root->upper_targets[UPPERREL_GROUP_AGG];
+	Assert(extra->patype == PARTITIONWISE_AGGREGATE_NONE ||
+		   extra->patype == PARTITIONWISE_AGGREGATE_FULL);
 
 	/* save the input_rel as outerrel in fpinfo */
 	fpinfo->outerrel = input_rel;
@@ -5295,8 +5315,13 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	fpinfo->user = ifpinfo->user;
 	merge_fdw_options(fpinfo, ifpinfo, NULL);
 
-	/* Assess if it is safe to push down aggregation and grouping. */
-	if (!foreign_grouping_ok(root, grouped_rel))
+	/*
+	 * Assess if it is safe to push down aggregation and grouping.
+	 *
+	 * Use HAVING qual from extra. In case of child partition, it will have
+	 * translated Vars.
+	 */
+	if (!foreign_grouping_ok(root, grouped_rel, extra->havingQual))
 		return;
 
 	/* Estimate the cost of push down */
@@ -5312,7 +5337,7 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* Create and add foreign path to the grouping relation. */
 	grouppath = create_foreignscan_path(root,
 										grouped_rel,
-										grouping_target,
+										grouped_rel->reltarget,
 										rows,
 										startup_cost,
 										total_cost,
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4d2e43c..cf32be4 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1932,3 +1932,54 @@ SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE
 SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
 
 RESET enable_partitionwise_join;
+
+
+-- ===================================================================
+-- test partitionwise aggregates
+-- ===================================================================
+
+CREATE TABLE pagg_tab (a int, b int, c text) PARTITION BY RANGE(a);
+
+CREATE TABLE pagg_tab_p1 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p2 (LIKE pagg_tab);
+CREATE TABLE pagg_tab_p3 (LIKE pagg_tab);
+
+INSERT INTO pagg_tab_p1 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 10;
+INSERT INTO pagg_tab_p2 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 20 and (i % 30) >= 10;
+INSERT INTO pagg_tab_p3 SELECT i % 30, i % 50, to_char(i/30, 'FM0000') FROM generate_series(1, 3000) i WHERE (i % 30) < 30 and (i % 30) >= 20;
+
+-- Create foreign partitions
+CREATE FOREIGN TABLE fpagg_tab_p1 PARTITION OF pagg_tab FOR VALUES FROM (0) TO (10) SERVER loopback OPTIONS (table_name 'pagg_tab_p1');
+CREATE FOREIGN TABLE fpagg_tab_p2 PARTITION OF pagg_tab FOR VALUES FROM (10) TO (20) SERVER loopback OPTIONS (table_name 'pagg_tab_p2');;
+CREATE FOREIGN TABLE fpagg_tab_p3 PARTITION OF pagg_tab FOR VALUES FROM (20) TO (30) SERVER loopback OPTIONS (table_name 'pagg_tab_p3');;
+
+ANALYZE pagg_tab;
+ANALYZE fpagg_tab_p1;
+ANALYZE fpagg_tab_p2;
+ANALYZE fpagg_tab_p3;
+
+-- When GROUP BY clause matches with PARTITION KEY.
+-- Plan with partitionwise aggregates is disabled
+SET enable_partitionwise_aggregate TO false;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- Plan with partitionwise aggregates is enabled
+SET enable_partitionwise_aggregate TO true;
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+SELECT a, sum(b), min(b), count(*) FROM pagg_tab GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- Check with whole-row reference
+-- Should have all the columns in the target list for the given relation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+SELECT a, count(t1) FROM pagg_tab t1 GROUP BY a HAVING avg(b) < 22 ORDER BY 1;
+
+-- When GROUP BY clause does not match with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT b, avg(a), max(a), count(*) FROM pagg_tab GROUP BY b HAVING sum(a) < 700 ORDER BY 1;
+
+
+-- Clean-up
+RESET enable_partitionwise_aggregate;
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 0ed3a47..25915a2 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -359,7 +359,8 @@ void
 GetForeignUpperPaths(PlannerInfo *root,
                      UpperRelationKind stage,
                      RelOptInfo *input_rel,
-                     RelOptInfo *output_rel);
+                     RelOptInfo *output_rel,
+                     void *extra);
 </programlisting>
      Create possible access paths for <firstterm>upper relation</firstterm> processing,
      which is the planner's term for all post-scan/join query processing, such
@@ -379,7 +380,10 @@ GetForeignUpperPaths(PlannerInfo *root,
      currently being considered.  <literal>output_rel</literal> is the upper relation
      that should receive paths representing computation of this step,
      and <literal>input_rel</literal> is the relation representing the input to this
-     step.  (Note that <structname>ForeignPath</structname> paths added
+     step.  <literal>extra</literal> parameter provides additional details which may
+     be needed for the paths creation, like child details in case of partitioning.
+     The details passed through this parameter is depends on the <literal>stage</literal>
+     parameter. (Note that <structname>ForeignPath</structname> paths added
      to <literal>output_rel</literal> would typically not have any direct dependency
      on paths of the <literal>input_rel</literal>, since their processing is expected
      to be done externally.  However, examining paths previously generated for
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8b4f031..8bd15c9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -3520,7 +3520,7 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
 	 * upper rel doesn't have relids set, but it covers all the base relations
 	 * participating in the underlying scan, so use root's all_baserels.
 	 */
-	if (IS_UPPER_REL(rel))
+	if (rel->reloptkind == RELOPT_UPPER_REL)
 		scan_plan->fs_relids = root->all_baserels;
 	else
 		scan_plan->fs_relids = best_path->path.parent->relids;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d4acde6..c5fef61 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2208,12 +2208,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	if (final_rel->fdwroutine &&
 		final_rel->fdwroutine->GetForeignUpperPaths)
 		final_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_FINAL,
-													current_rel, final_rel);
+													current_rel, final_rel,
+													NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_FINAL,
-									current_rel, final_rel);
+									current_rel, final_rel, NULL);
 
 	/* Note: currently, we leave it to callers to do set_cheapest() */
 }
@@ -4026,12 +4027,14 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	if (grouped_rel->fdwroutine &&
 		grouped_rel->fdwroutine->GetForeignUpperPaths)
 		grouped_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_GROUP_AGG,
-													  input_rel, grouped_rel);
+													  input_rel, grouped_rel,
+													  extra);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_GROUP_AGG,
-									input_rel, grouped_rel);
+									input_rel, grouped_rel,
+									extra);
 }
 
 /*
@@ -4463,12 +4466,13 @@ create_window_paths(PlannerInfo *root,
 	if (window_rel->fdwroutine &&
 		window_rel->fdwroutine->GetForeignUpperPaths)
 		window_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_WINDOW,
-													 input_rel, window_rel);
+													 input_rel, window_rel,
+													 NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_WINDOW,
-									input_rel, window_rel);
+									input_rel, window_rel, NULL);
 
 	/* Now choose the best path(s) */
 	set_cheapest(window_rel);
@@ -4767,12 +4771,13 @@ create_distinct_paths(PlannerInfo *root,
 	if (distinct_rel->fdwroutine &&
 		distinct_rel->fdwroutine->GetForeignUpperPaths)
 		distinct_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_DISTINCT,
-													   input_rel, distinct_rel);
+													   input_rel, distinct_rel,
+													   NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_DISTINCT,
-									input_rel, distinct_rel);
+									input_rel, distinct_rel, NULL);
 
 	/* Now choose the best path(s) */
 	set_cheapest(distinct_rel);
@@ -4910,12 +4915,13 @@ create_ordered_paths(PlannerInfo *root,
 	if (ordered_rel->fdwroutine &&
 		ordered_rel->fdwroutine->GetForeignUpperPaths)
 		ordered_rel->fdwroutine->GetForeignUpperPaths(root, UPPERREL_ORDERED,
-													  input_rel, ordered_rel);
+													  input_rel, ordered_rel,
+													  NULL);
 
 	/* Let extensions possibly add some more paths */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_ORDERED,
-									input_rel, ordered_rel);
+									input_rel, ordered_rel, NULL);
 
 	/*
 	 * No need to bother with set_cheapest here; grouping_planner does not
@@ -6696,7 +6702,8 @@ create_partial_grouping_paths(PlannerInfo *root,
 
 		fdwroutine->GetForeignUpperPaths(root,
 										 UPPERREL_PARTIAL_GROUP_AGG,
-										 input_rel, partially_grouped_rel);
+										 input_rel, partially_grouped_rel,
+										 extra);
 	}
 
 	return partially_grouped_rel;
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 6e510f9..5236ab3 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1032,7 +1032,7 @@ postprocess_setop_rel(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	if (create_upper_paths_hook)
 		(*create_upper_paths_hook) (root, UPPERREL_SETOP,
-									NULL, rel);
+									NULL, rel, NULL);
 
 	/* Select cheapest path */
 	set_cheapest(rel);
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index e88fee3..ea83c7b 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -62,7 +62,8 @@ typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root,
 typedef void (*GetForeignUpperPaths_function) (PlannerInfo *root,
 											   UpperRelationKind stage,
 											   RelOptInfo *input_rel,
-											   RelOptInfo *output_rel);
+											   RelOptInfo *output_rel,
+											   void *extra);
 
 typedef void (*AddForeignUpdateTargets_function) (Query *parsetree,
 												  RangeTblEntry *target_rte,
diff --git a/src/include/optimizer/planner.h b/src/include/optimizer/planner.h
index 0d8b88d..07a3bc0 100644
--- a/src/include/optimizer/planner.h
+++ b/src/include/optimizer/planner.h
@@ -28,7 +28,8 @@ extern PGDLLIMPORT planner_hook_type planner_hook;
 typedef void (*create_upper_paths_hook_type) (PlannerInfo *root,
 											  UpperRelationKind stage,
 											  RelOptInfo *input_rel,
-											  RelOptInfo *output_rel);
+											  RelOptInfo *output_rel,
+											  void *extra);
 extern PGDLLIMPORT create_upper_paths_hook_type create_upper_paths_hook;
 
 
-- 
1.8.3.1

0003-Add-agg_costs-in-GroupPathExtraData-so-that-FDW-can-.patchtext/x-patch; charset=US-ASCII; name=0003-Add-agg_costs-in-GroupPathExtraData-so-that-FDW-can-.patchDownload

From f6569b886cd4db9b7bb1de65b530a692dd6dc734 Mon Sep 17 00:00:00 2001
From: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Date: Thu, 29 Mar 2018 18:04:05 +0530
Subject: [PATCH 3/3] Add agg_costs in GroupPathExtraData so that FDW can use
 it.

We do pass GroupPathExtraData to postgres_fdw, thus add agg_costs
too in the struct so that we can use these cost estimates in
postgres_fdw too avoiding recalculations.

In passing, updated couple of function signatures where we were
passing both agg_costs and "extra" so that it just passes the
"extra" parameter.
---
 contrib/postgres_fdw/postgres_fdw.c  | 43 +++++++++++++-----------------------
 src/backend/optimizer/plan/planner.c | 31 ++++++++++++--------------
 src/include/nodes/relation.h         |  3 +++
 3 files changed, 32 insertions(+), 45 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index dbebbda..6d85569 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -362,6 +362,7 @@ static void estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *foreignrel,
 						List *param_join_conds,
 						List *pathkeys,
+						const AggClauseCosts *agg_costs,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost);
 static void get_remote_estimate(const char *sql,
@@ -614,7 +615,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		 * values in fpinfo so we don't need to do it again to generate the
 		 * basic foreign path.
 		 */
-		estimate_path_cost_size(root, baserel, NIL, NIL,
+		estimate_path_cost_size(root, baserel, NIL, NIL, NULL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 
@@ -645,7 +646,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		set_baserel_size_estimates(root, baserel);
 
 		/* Fill in basically-bogus cost estimates for use later. */
-		estimate_path_cost_size(root, baserel, NIL, NIL,
+		estimate_path_cost_size(root, baserel, NIL, NIL, NULL,
 								&fpinfo->rows, &fpinfo->width,
 								&fpinfo->startup_cost, &fpinfo->total_cost);
 	}
@@ -1076,7 +1077,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 
 		/* Get a cost estimate from the remote */
 		estimate_path_cost_size(root, baserel,
-								param_info->ppi_clauses, NIL,
+								param_info->ppi_clauses, NIL, NULL,
 								&rows, &width,
 								&startup_cost, &total_cost);
 
@@ -2588,6 +2589,7 @@ estimate_path_cost_size(PlannerInfo *root,
 						RelOptInfo *foreignrel,
 						List *param_join_conds,
 						List *pathkeys,
+						const AggClauseCosts *agg_costs,
 						double *p_rows, int *p_width,
 						Cost *p_startup_cost, Cost *p_total_cost)
 {
@@ -2779,7 +2781,6 @@ estimate_path_cost_size(PlannerInfo *root,
 		{
 			PgFdwRelationInfo *ofpinfo;
 			PathTarget *ptarget = foreignrel->reltarget;
-			AggClauseCosts aggcosts;
 			double		input_rows;
 			int			numGroupCols;
 			double		numGroups = 1;
@@ -2808,23 +2809,6 @@ estimate_path_cost_size(PlannerInfo *root,
 			input_rows = ofpinfo->rows;
 			width = ofpinfo->width;
 
-			/* Collect statistics about aggregates for estimating costs. */
-			MemSet(&aggcosts, 0, sizeof(AggClauseCosts));
-			if (root->parse->hasAggs)
-			{
-				get_agg_clause_costs(root, (Node *) fpinfo->grouped_tlist,
-									 AGGSPLIT_SIMPLE, &aggcosts);
-
-				/*
-				 * Cost of aggregates within the HAVING qual will remain same
-				 * for both parent and a child. Thus, in case of child upper
-				 * rel, it is not necessary to consider translated HAVING qual
-				 * here. Hence use having qual from the root itself.
-				 */
-				get_agg_clause_costs(root, (Node *) root->parse->havingQual,
-									 AGGSPLIT_SIMPLE, &aggcosts);
-			}
-
 			/* Get number of grouping columns and possible number of groups */
 			numGroupCols = list_length(root->parse->groupClause);
 			numGroups = estimate_num_groups(root,
@@ -2838,6 +2822,9 @@ estimate_path_cost_size(PlannerInfo *root,
 			 */
 			rows = retrieved_rows = numGroups;
 
+			/* agg_costs should not be null while grouping. */
+			Assert(agg_costs);
+
 			/*-----
 			 * Startup cost includes:
 			 *	  1. Startup cost for underneath input * relation
@@ -2846,8 +2833,8 @@ estimate_path_cost_size(PlannerInfo *root,
 			 *-----
 			 */
 			startup_cost = ofpinfo->rel_startup_cost;
-			startup_cost += aggcosts.transCost.startup;
-			startup_cost += aggcosts.transCost.per_tuple * input_rows;
+			startup_cost += agg_costs->transCost.startup;
+			startup_cost += agg_costs->transCost.per_tuple * input_rows;
 			startup_cost += (cpu_operator_cost * numGroupCols) * input_rows;
 			startup_cost += ptarget->cost.startup;
 
@@ -2859,7 +2846,7 @@ estimate_path_cost_size(PlannerInfo *root,
 			 *-----
 			 */
 			run_cost = ofpinfo->rel_total_cost - ofpinfo->rel_startup_cost;
-			run_cost += aggcosts.finalCost * numGroups;
+			run_cost += agg_costs->finalCost * numGroups;
 			run_cost += cpu_tuple_cost * numGroups;
 			run_cost += ptarget->cost.per_tuple * numGroups;
 		}
@@ -4761,7 +4748,7 @@ add_paths_with_pathkeys_for_rel(PlannerInfo *root, RelOptInfo *rel,
 		List	   *useful_pathkeys = lfirst(lc);
 		Path	   *sorted_epq_path;
 
-		estimate_path_cost_size(root, rel, NIL, useful_pathkeys,
+		estimate_path_cost_size(root, rel, NIL, useful_pathkeys, NULL,
 								&rows, &width, &startup_cost, &total_cost);
 
 		/*
@@ -4993,7 +4980,7 @@ postgresGetForeignJoinPaths(PlannerInfo *root,
 														extra->sjinfo);
 
 	/* Estimate costs for bare join relation */
-	estimate_path_cost_size(root, joinrel, NIL, NIL, &rows,
+	estimate_path_cost_size(root, joinrel, NIL, NIL, NULL, &rows,
 							&width, &startup_cost, &total_cost);
 	/* Now update this information in the joinrel */
 	joinrel->rows = rows;
@@ -5325,8 +5312,8 @@ add_foreign_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 		return;
 
 	/* Estimate the cost of push down */
-	estimate_path_cost_size(root, grouped_rel, NIL, NIL, &rows,
-							&width, &startup_cost, &total_cost);
+	estimate_path_cost_size(root, grouped_rel, NIL, NIL, extra->agg_costs,
+							&rows, &width, &startup_cost, &total_cost);
 
 	/* Now update this information in the fpinfo */
 	fpinfo->rows = rows;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c5fef61..32b2dc8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -152,7 +152,6 @@ static RelOptInfo *make_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 static void create_ordinary_grouping_paths(PlannerInfo *root,
 							   RelOptInfo *input_rel,
 							   RelOptInfo *grouped_rel,
-							   const AggClauseCosts *agg_costs,
 							   grouping_sets_data *gd,
 							   GroupPathExtraData *extra,
 							   RelOptInfo **partially_grouped_rel_p);
@@ -207,7 +206,6 @@ static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
 static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  RelOptInfo *grouped_rel,
 						  RelOptInfo *partially_grouped_rel,
-						  const AggClauseCosts *agg_costs,
 						  grouping_sets_data *gd,
 						  double dNumGroups,
 						  GroupPathExtraData *extra);
@@ -229,7 +227,6 @@ static void create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *input_rel,
 									RelOptInfo *grouped_rel,
 									RelOptInfo *partially_grouped_rel,
-									const AggClauseCosts *agg_costs,
 									grouping_sets_data *gd,
 									PartitionwiseAggregateType patype,
 									GroupPathExtraData *extra);
@@ -3689,6 +3686,12 @@ create_grouping_paths(PlannerInfo *root,
 		GroupPathExtraData extra;
 
 		/*
+		 * Set agg_costs into the extra so that other routines like FDW can use
+		 * it directly rather than computing it again.
+		 */
+		extra.agg_costs = agg_costs;
+
+		/*
 		 * Determine whether it's possible to perform sort-based
 		 * implementations of grouping.  (Note that if groupClause is empty,
 		 * grouping_is_sortable() is trivially true, and all the
@@ -3751,9 +3754,8 @@ create_grouping_paths(PlannerInfo *root,
 		else
 			extra.patype = PARTITIONWISE_AGGREGATE_NONE;
 
-		create_ordinary_grouping_paths(root, input_rel, grouped_rel,
-									   agg_costs, gd, &extra,
-									   &partially_grouped_rel);
+		create_ordinary_grouping_paths(root, input_rel, grouped_rel, gd,
+									   &extra, &partially_grouped_rel);
 	}
 
 	set_cheapest(grouped_rel);
@@ -3907,9 +3909,7 @@ create_degenerate_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
  */
 static void
 create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
-							   RelOptInfo *grouped_rel,
-							   const AggClauseCosts *agg_costs,
-							   grouping_sets_data *gd,
+							   RelOptInfo *grouped_rel, grouping_sets_data *gd,
 							   GroupPathExtraData *extra,
 							   RelOptInfo **partially_grouped_rel_p)
 {
@@ -3979,8 +3979,8 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* Apply partitionwise aggregation technique, if possible. */
 	if (patype != PARTITIONWISE_AGGREGATE_NONE)
 		create_partitionwise_grouping_paths(root, input_rel, grouped_rel,
-											partially_grouped_rel, agg_costs,
-											gd, patype, extra);
+											partially_grouped_rel, gd, patype,
+											extra);
 
 	/* If we are doing partial aggregation only, return. */
 	if (extra->patype == PARTITIONWISE_AGGREGATE_PARTIAL)
@@ -4010,8 +4010,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
 	/* Build final grouping paths */
 	add_paths_to_grouping_rel(root, input_rel, grouped_rel,
-							  partially_grouped_rel, agg_costs, gd,
-							  dNumGroups, extra);
+							  partially_grouped_rel, gd, dNumGroups, extra);
 
 	/* Give a helpful error if we failed to find any implementation */
 	if (grouped_rel->pathlist == NIL)
@@ -6189,7 +6188,6 @@ static void
 add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 						  RelOptInfo *grouped_rel,
 						  RelOptInfo *partially_grouped_rel,
-						  const AggClauseCosts *agg_costs,
 						  grouping_sets_data *gd, double dNumGroups,
 						  GroupPathExtraData *extra)
 {
@@ -6199,6 +6197,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
 	List	   *havingQual = (List *) extra->havingQual;
+	const AggClauseCosts *agg_costs = extra->agg_costs;
 	AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
 
 	if (can_sort)
@@ -6906,7 +6905,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *input_rel,
 									RelOptInfo *grouped_rel,
 									RelOptInfo *partially_grouped_rel,
-									const AggClauseCosts *agg_costs,
 									grouping_sets_data *gd,
 									PartitionwiseAggregateType patype,
 									GroupPathExtraData *extra)
@@ -7006,8 +7004,7 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 
 		/* Create grouping paths for this child relation. */
 		create_ordinary_grouping_paths(root, child_input_rel,
-									   child_grouped_rel,
-									   agg_costs, gd, &child_extra,
+									   child_grouped_rel, gd, &child_extra,
 									   &child_partially_grouped_rel);
 
 		if (child_partially_grouped_rel)
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 2b4f773..4ed3eb7 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -2335,6 +2335,8 @@ typedef enum
 /*
  * Struct for extra information passed to subroutines of create_grouping_paths
  *
+ * agg_costs gives cost info about all aggregates in query (in AGGSPLIT_SIMPLE
+ * 		mode)
  * flags indicating what kinds of grouping are possible.
  * partial_costs_set is true if the agg_partial_costs and agg_final_costs
  * 		have been initialized.
@@ -2348,6 +2350,7 @@ typedef enum
 typedef struct
 {
 	/* Data which remains constant once set. */
+	const AggClauseCosts *agg_costs;
 	int			flags;
 	bool		partial_costs_set;
 	AggClauseCosts agg_partial_costs;
-- 
1.8.3.1

#150

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#148)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 29, 2018 at 4:13 PM, Ashutosh Bapat <
ashutosh.bapat@enterprisedb.com> wrote:

On Wed, Mar 28, 2018 at 7:21 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

Ah sorry, I was wrong about remote_conds. remote_conds and local_conds
are basically the conditions on the relation being pushed down.
havingQuals are conditions on a grouped relation so treating them like
baserestrictinfo or join conditions looks more straight forward,
rather than having a separate member in PgFdwRelationInfo. So, remote
havingQuals go into remote_conds and local havingQuals go to
local_conds.

Looks like we already do that. Then we have remote_conds, local_conds
which together should be equivalent to havingQual. Storing all those
three doesn't make sense. In future someone may use havingQual instead
of remote_conds/local_conds just because its available and then there
is risk of these three lists going out of sync.

Yep, I see the risk.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#151

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Jeevan Chalke (#150)

Re: [HACKERS] Partition-wise aggregation/grouping

On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Yep, I see the risk.

Committed 0001 last week and 0002 just now. I don't really see 0003 a
a critical need. If somebody demonstrates that this saves a
meaningful amount of planning time, we can consider that part for a
future release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#152

David Steele

david@pgmasters.net

almost 8 years ago

In reply to: Robert Haas (#151)

Re: [HACKERS] Partition-wise aggregation/grouping

Hi Jeevan,

On 4/2/18 10:57 AM, Robert Haas wrote:

On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Yep, I see the risk.

Committed 0001 last week and 0002 just now. I don't really see 0003 a
a critical need. If somebody demonstrates that this saves a
meaningful amount of planning time, we can consider that part for a
future release.

The bulk of this patch was committed so I have marked it that way.

If you would like to pursue patch 03 I think it would be best to start a
new thread and demonstrate how the patch will improve performance.

Regards,
--
-David
david@pgmasters.net

#153

Jeevan Chalke

jeevan.chalke@enterprisedb.com

almost 8 years ago

In reply to: David Steele (#152)

Re: [HACKERS] Partition-wise aggregation/grouping

On Tue, Apr 10, 2018 at 7:30 PM, David Steele <david@pgmasters.net> wrote:

Hi Jeevan,

On 4/2/18 10:57 AM, Robert Haas wrote:

On Thu, Mar 29, 2018 at 9:02 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:

Yep, I see the risk.

Committed 0001 last week and 0002 just now. I don't really see 0003 a
a critical need. If somebody demonstrates that this saves a
meaningful amount of planning time, we can consider that part for a
future release.

The bulk of this patch was committed so I have marked it that way.

Thanks, David.

If you would like to pursue patch 03 I think it would be best to start a
new thread and demonstrate how the patch will improve performance.

Sure.

Regards,
--
-David
david@pgmasters.net

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company