Eager aggregation, take 3
Hi All,
Eager aggregation is a query optimization technique that partially
pushes a group-by past a join, and finalizes it once all the relations
are joined. Eager aggregation reduces the number of input rows to the
join and thus may result in a better overall plan. This technique is
thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
paper [1]https://www.vldb.org/conf/1995/P345.PDF.
Back in 2017, a patch set has been proposed by Antonin Houska to
implement eager aggregation in thread [2]/messages/by-id/9666.1491295317@localhost. However, it was at last
withdrawn after entering the pattern of "please rebase thx" followed by
rebasing and getting no feedback until "please rebase again thx". A
second attempt in 2022 unfortunately fell into the same pattern about
one year ago and was eventually closed again [3]/messages/by-id/OS3PR01MB66609589B896FBDE190209F495EE9@OS3PR01MB6660.jpnprd01.prod.outlook.com.
That patch set has included most of the necessary concepts to implement
eager aggregation. However, as far as I can see, it has several weak
points that we need to address. It introduces invasive changes to some
core planner functions, such as make_join_rel(). And with such changes
join_is_legal() would be performed three times for the same proposed
join, which is not great. Another weak point is that the complexity of
join searching dramatically increases with the growing number of
relations to be joined. This occurs because when we generate partially
aggregated paths, each path of the input relation is considered as an
input path for the grouped paths. As a result, the number of grouped
paths we generate increases exponentially, leading to a significant
explosion in computational complexity. Other weak points include the
lack of support for outer joins and partitionwise joins. And during my
review of the code, I came across several bugs (planning error or crash)
that need to be addressed.
I'd like to give it another take to implement eager aggregation, while
borrowing lots of concepts and many chunks of codes from the previous
patch set. Please see attached. I have chosen to use the term 'Eager
Aggregation' from the paper [1]https://www.vldb.org/conf/1995/P345.PDF instead of 'Aggregation push-down', to
differentiate the aggregation push-down technique in FDW.
The patch has been split into small pieces to make the review easier.
0001 introduces the RelInfoList structure, which encapsulates both a
list and a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for upper relations. With
eager aggregation, it is possible that we generate so many upper rels of
type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
lookups.
0002 introduces the RelAggInfo structure to store information needed to
create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
0003 checks if eager aggregation is applicable, and if so, collects
suitable aggregate expressions and grouping expressions in the query,
and records them in root->agg_clause_list and root->group_expr_list
respectively.
0004 implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier. In this patch, when we check
if a target expression can act as grouping expression, we need to check
if this expression can be known equal to other expressions due to ECs
that can act as grouping expressions. This patch leverages function
exprs_known_equal() to achieve that, after enhancing this function to
consider opfamily if provided.
0005 implements the functions that generate paths for grouped relations
by adding sorted and hashed partial aggregation paths on top of paths of
the plain base or join relations. For sorted partial aggregation paths,
we only consider any suitably-sorted input paths as well as sorting the
cheapest-total path. For hashed partial aggregation paths, we only
consider the cheapest-total path as input. By not considering other
paths we can reduce the number of grouping paths as much as possible
while still achieving reasonable results.
0006 builds grouped relations for each base relation if possible, and
generates aggregation paths for the grouped base relations.
0007 builds grouped relations for each just-processed join relation if
possible, and generates aggregation paths for the grouped join
relations. The changes made to make_join_rel() are relatively minor,
with the addition of a new function make_grouped_join_rel(), which finds
or creates a grouped relation for the just-processed joinrel, and
generates grouped paths by joining a grouped input relation with a
non-grouped input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see 0005).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer join, the aggregate cannot be pushed down if any column referenced
by grouping expressions or aggregate functions is nullable by an outer
join above the relation to which we want to apply the partiall
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
0008 adds test cases for eager aggregation.
0009 adds a section in README that describes this feature (copied from
previous patch set, with minor tweaks).
Thoughts and comments are welcome.
[1]: https://www.vldb.org/conf/1995/P345.PDF
[2]: /messages/by-id/9666.1491295317@localhost
[3]: /messages/by-id/OS3PR01MB66609589B896FBDE190209F495EE9@OS3PR01MB6660.jpnprd01.prod.outlook.com
/messages/by-id/OS3PR01MB66609589B896FBDE190209F495EE9@OS3PR01MB6660.jpnprd01.prod.outlook.com
Thanks
Richard
Attachments:
v1-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v1-0001-Introduce-RelInfoList-structure.patchDownload
From 542f02eb98b84dad9990c03bef792bb3e816fd23 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v1 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 142dcfc995..f46fc604b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d404fbf262..351bf2e9e4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3413,9 +3413,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ca47c7d310..3341e64d2b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -65,8 +65,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e5f4062bfb..9e25750acd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -472,11 +476,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -484,47 +488,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -534,23 +540,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -562,6 +568,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -611,32 +665,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1462,22 +1490,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1496,7 +1516,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..be51e2c652 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -267,15 +286,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -408,7 +421,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
v1-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v1-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 6b3b7a944bbb018e77dd8e4b787b9c660a9ed69b Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v1 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9384c54ed9..f47ad04846 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -41,6 +41,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
@@ -50,6 +51,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3306,6 +3308,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6f79b2e3fe..ee455f7ec2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2699,8 +2699,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2952,8 +2951,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -2999,8 +2997,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3158,8 +3155,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index dcea10888b..68fc05432c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
v1-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v1-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From a7658376eb1461132627825f4deabb73a4e53d1d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v1 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 624 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 662 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 4bd60a09c6..1890dbb852 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2439,15 +2439,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2462,6 +2464,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2490,8 +2503,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2532,7 +2544,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 8b8def21ca..db66a3e189 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -68,6 +68,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c88da963db..e7f465ef7b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -640,6 +648,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2464,3 +2524,567 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+ Relids parent_relids = NULL;
+ AppendRelInfo **appinfos;
+ int nappinfos;
+ int cnt;
+
+ appinfos = find_appinfos_by_relids(root, rel->relids, &nappinfos);
+
+ for (cnt = 0; cnt < nappinfos; cnt++)
+ parent_relids = bms_add_member(parent_relids,
+ appinfos[cnt]->parent_relid);
+
+ Assert(!bms_is_empty(parent_relids));
+ rel_grouped = find_grouped_rel(root, parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+
+ agg_info = (RelAggInfo *) adjust_appendrel_attrs(root,
+ (Node *) agg_info,
+ nappinfos,
+ appinfos);
+
+ pfree(appinfos);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether given Var appears in Aggref(s) which we consider usable at
+ * relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (bms_is_member(var->varno, ac_info->agg_eval_at))
+ break;;
+ }
+
+ /* No aggregate references the Var? */
+ if (lc == NULL)
+ return false;
+
+ /* Does the Var appear in the target outside aggregates? */
+ foreach(lc, root->processed_tlist)
+ {
+ TargetEntry *tle = lfirst_node(TargetEntry, lc);
+ List *vars;
+
+ if (IsA(tle->expr, Aggref))
+ continue;
+
+ vars = pull_var_clause((Node *) tle->expr,
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ return false;
+ }
+
+ list_free(vars);
+ }
+
+ /* The Var is in aggregate(s) and only there. */
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..d1365229f7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 69ed9eb1f6..3ef5195323 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -429,6 +429,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..8d03ce2c57 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -344,4 +348,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040a047b81..dcea10888b 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -160,7 +160,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
v1-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v1-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From efad6c39e247078c6d3cdf3cf8561bd5d35004e6 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v1 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 9e25750acd..c88da963db 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -477,7 +477,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -497,19 +497,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -517,9 +525,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -550,7 +558,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -558,10 +566,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -576,32 +592,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1496,7 +1520,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index be51e2c652..d67f725ad6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1065,6 +1065,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
v1-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v1-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From 9798f1f4e4d1e6aef6b712df452fc5f14e736292 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v1 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 351bf2e9e4..9384c54ed9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -80,6 +80,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d4a9d77d7f..36c82bd696 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -328,6 +331,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3341e64d2b..8b8def21ca 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -78,6 +78,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -264,6 +266,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 527a2b2734..515e6d7737 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -984,6 +984,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c97f9a25f0..f841915482 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -401,6 +401,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d67f725ad6..69ed9eb1f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -383,6 +383,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3193,6 +3199,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0e8a9c94ba..040a047b81 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 9be7aca2b8..a83a41b0f8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -134,7 +135,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
v1-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v1-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From 4d5639555cb14fa74f20e61ba79c155ec9be8b23 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v1 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f47ad04846..ea2341d110 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -96,6 +96,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -120,6 +121,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -188,6 +190,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -329,6 +336,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -565,6 +625,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1292,6 +1361,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e7f465ef7b..83cdbb38bc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "rewrite/rewriteManip.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -411,6 +415,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8d03ce2c57..6b856a5e77 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
v1-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v1-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 429cab42ee94a88eef79dfb3575ded35b8056a1c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v1 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
If we are joining rel1 and rel2, the aggregation paths for the grouped
join relation are generated by 1) joining the grouped paths of rel1 to
the plain paths of rel2, or joining the grouped paths of rel2 to the
plain paths of rel1, and 2) adding sorted and hashed partial aggregation
paths on top of paths of the plain join rel except for the topmost join
rel.
This commit also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that.
This commit also makes eager aggregation work for partitionwise join and
for geqo.
Starting from this commit, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 +++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 ++++++++++
src/backend/optimizer/path/joinrels.c | 115 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 35 ++++++--
src/backend/optimizer/util/appendinfo.c | 64 +++++++++++++
5 files changed, 320 insertions(+), 26 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ea2341d110..440a5daec7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3864,6 +3864,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3884,6 +3888,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4752,6 +4777,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..a9ef081597 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -753,6 +758,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -864,6 +873,107 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ Relids joinrelids;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ joinrelids = bms_union(rel1->relids, rel2->relids);
+ rel_grouped = find_grouped_rel(root, joinrelids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ /* we should not see dummy grouped relation */
+ Assert(rel1_grouped == NULL || !IS_DUMMY_REL(rel1_grouped));
+ Assert(rel2_grouped == NULL || !IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_grouped == NULL &&
+ rel2_grouped == NULL)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (rel1_grouped != NULL &&
+ rel2_grouped != NULL)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (rel1_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (rel2_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1653,6 +1763,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index be4e182869..f8f2a09f1b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3946,10 +3946,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
+
+ /*
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
+ */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
set_cheapest(partially_grouped_rel);
- }
/*
* Estimate number of groups.
@@ -7043,6 +7049,13 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7066,19 +7079,25 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
!force_rel_creation)
- return NULL;
+ return partially_grouped_rel;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 51fdeace7d..7016473047 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,70 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ newinfo->agg_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
--
2.31.0
v1-0009-Add-README.patchapplication/octet-stream; name=v1-0009-Add-README.patchDownload
From 2037ffb3a2636203d4105c2ee0e47b9aa67041d7 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v1 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..fa5cdc135f 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partiall aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
v1-0008-Add-test-cases.patchapplication/octet-stream; name=v1-0008-Add-test-cases.patchDownload
From 09fed8131d6b2def5e5d76c7b73e86a9ae997c7a Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v1 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1270 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 205 +++
3 files changed, 1476 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..2d7dec8a5d
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1270 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(15 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 6 | 501
+ 7 | 502
+ 3 | 498
+ 4 | 499
+ 9 | 504
+ 5 | 500
+ 8 | 503
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(22 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 6 | 507
+ 7 | 509
+ 3 | 501
+ 4 | 503
+ 9 | 513
+ 5 | 505
+ 8 | 511
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ QUERY PLAN
+------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.a, avg(t3.c)
+ Group Key: t3.a
+ -> Sort
+ Output: t3.a, (PARTIAL avg(t3.c))
+ Sort Key: t3.a
+ -> Hash Left Join
+ Output: t3.a, (PARTIAL avg(t3.c))
+ Hash Cond: (t3.b = t1.b)
+ -> Partial HashAggregate
+ Output: t3.a, t3.b, PARTIAL avg(t3.c)
+ Group Key: t3.a, t3.b
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(18 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg
+---+-----
+ 0 | 505
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ QUERY PLAN
+------------------------------------------------------
+ HashAggregate
+ Output: t3.a, avg(t3.c)
+ Group Key: t3.a
+ -> Hash Right Join
+ Output: t3.a, t3.c
+ Hash Cond: (t3.b = t1.b)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(12 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg
+---+-----
+ 8 | 503
+ |
+ 9 | 504
+ 7 | 502
+ 1 | 496
+ 5 | 500
+ 4 | 499
+ 2 | 497
+ 6 | 501
+ 3 | 498
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(46 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+ x | sum | count
+----+------+-------
+ 6 | 1100 | 100
+ 0 | 500 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(46 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+ y | sum | count
+----+------+-------
+ 6 | 1100 | 100
+ 0 | 500 | 100
+ 18 | 1300 | 100
+ 12 | 700 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(41 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+ x | sum | count
+----+------+-------
+ 4 | 1200 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+ 2 | 600 | 50
+ 12 | 600 | 50
+ 8 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(67 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+ x | sum
+----+-------
+ 4 | 18000
+ 2 | 14000
+ 8 | 26000
+ 6 | 22000
+ 0 | 10000
+ 16 | 22000
+ 10 | 10000
+ 14 | 18000
+ 12 | 14000
+ 18 | 26000
+ 26 | 22000
+ 28 | 26000
+ 22 | 14000
+ 20 | 10000
+ 24 | 18000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(76 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+ x | sum | count
+----+-------+-------
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 7 | 8092 | 1156
+ 1 | 1156 | 1156
+ 5 | 5780 | 1156
+ 4 | 4624 | 1156
+ 2 | 2312 | 1156
+ 0 | 0 | 1089
+ 6 | 6936 | 1156
+ 3 | 3468 | 1156
+ 11 | 11979 | 1089
+ 13 | 14157 | 1089
+ 10 | 11560 | 1156
+ 14 | 15246 | 1089
+ 12 | 13068 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 16 | 17424 | 1089
+ 15 | 16335 | 1089
+ 19 | 20691 | 1089
+ 24 | 26136 | 1089
+ 21 | 22869 | 1089
+ 23 | 25047 | 1089
+ 22 | 23958 | 1089
+ 20 | 21780 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 25 | 27225 | 1089
+ 29 | 31581 | 1089
+ 28 | 30492 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(64 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+ y | sum | count
+----+-------+-------
+ 29 | 31581 | 1089
+ 4 | 4624 | 1156
+ 0 | 0 | 1089
+ 10 | 11560 | 1156
+ 9 | 10404 | 1156
+ 7 | 8092 | 1156
+ 15 | 16335 | 1089
+ 6 | 6936 | 1156
+ 26 | 28314 | 1089
+ 12 | 13068 | 1089
+ 24 | 26136 | 1089
+ 19 | 20691 | 1089
+ 25 | 27225 | 1089
+ 21 | 22869 | 1089
+ 14 | 15246 | 1089
+ 3 | 3468 | 1156
+ 17 | 18513 | 1089
+ 28 | 30492 | 1089
+ 22 | 23958 | 1089
+ 20 | 21780 | 1089
+ 13 | 14157 | 1089
+ 1 | 1156 | 1156
+ 5 | 5780 | 1156
+ 18 | 19602 | 1089
+ 2 | 2312 | 1156
+ 16 | 17424 | 1089
+ 27 | 29403 | 1089
+ 23 | 25047 | 1089
+ 11 | 11979 | 1089
+ 8 | 9248 | 1156
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(111 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+ x | sum | count
+----+---------+-------
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 7 | 550256 | 39304
+ 1 | 78608 | 39304
+ 5 | 393040 | 39304
+ 4 | 314432 | 39304
+ 2 | 157216 | 39304
+ 0 | 0 | 35937
+ 6 | 471648 | 39304
+ 3 | 235824 | 39304
+ 11 | 790614 | 35937
+ 13 | 934362 | 35937
+ 10 | 786080 | 39304
+ 14 | 1006236 | 35937
+ 12 | 862488 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 16 | 1149984 | 35937
+ 15 | 1078110 | 35937
+ 19 | 1365606 | 35937
+ 24 | 1724976 | 35937
+ 21 | 1509354 | 35937
+ 23 | 1653102 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 25 | 1796850 | 35937
+ 29 | 2084346 | 35937
+ 28 | 2012472 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(99 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+ y | sum | count
+----+---------+-------
+ 29 | 2084346 | 35937
+ 4 | 314432 | 39304
+ 0 | 0 | 35937
+ 10 | 786080 | 39304
+ 9 | 707472 | 39304
+ 7 | 550256 | 39304
+ 15 | 1078110 | 35937
+ 6 | 471648 | 39304
+ 26 | 1868724 | 35937
+ 12 | 862488 | 35937
+ 24 | 1724976 | 35937
+ 19 | 1365606 | 35937
+ 25 | 1796850 | 35937
+ 21 | 1509354 | 35937
+ 14 | 1006236 | 35937
+ 3 | 235824 | 39304
+ 17 | 1221858 | 35937
+ 28 | 2012472 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 13 | 934362 | 35937
+ 1 | 78608 | 39304
+ 5 | 393040 | 39304
+ 18 | 1293732 | 35937
+ 2 | 157216 | 39304
+ 16 | 1149984 | 35937
+ 27 | 1940598 | 35937
+ 23 | 1653102 | 35937
+ 11 | 790614 | 35937
+ 8 | 628864 | 39304
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..250a9dba21 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..aba2c41557
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,205 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
Richard Guo <guofenglinux@gmail.com> writes:
Hi All,
Eager aggregation is a query optimization technique that partially
pushes a group-by past a join, and finalizes it once all the relations
are joined. Eager aggregation reduces the number of input rows to the
join and thus may result in a better overall plan. This technique is
thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
paper [1].
This is a really helpful and not easy task, even I am not sure when I
can spend time to study this, I want to say "Thanks for working on
this!" first and hope we can really progress on this topic. Good luck!
--
Best Regards
Andy Fan
On Mon, Mar 4, 2024 at 7:49 PM Andy Fan <zhihuifan1213@163.com> wrote:
This is a really helpful and not easy task, even I am not sure when I
can spend time to study this, I want to say "Thanks for working on
this!" first and hope we can really progress on this topic. Good luck!
Thanks. I hope this take can go even further and ultimately find its
way to be committed.
This needs a rebase after dbbca2cf29. I also revised the commit message
for 0007 and fixed a typo in 0009.
Thanks
Richard
Attachments:
v2-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v2-0001-Introduce-RelInfoList-structure.patchDownload
From 7ad8bd304cdf5c2c1ef6b6c44d0ad780e1826137 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v2 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 142dcfc995..f46fc604b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0b98f0856e..f8a5fbcb0a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3410,9 +3410,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 075d36c7ec..eb78e37317 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e5f4062bfb..9e25750acd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -472,11 +476,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -484,47 +488,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -534,23 +540,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -562,6 +568,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -611,32 +665,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1462,22 +1490,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1496,7 +1516,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..be51e2c652 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -267,15 +286,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -408,7 +421,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
v2-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v2-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From 6c2afe0414f7c8a8b3df18d241f2912965ffda1c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v2 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 624 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 662 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 4bd60a09c6..1890dbb852 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2439,15 +2439,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2462,6 +2464,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2490,8 +2503,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2532,7 +2544,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 197a3f905e..0ff0ca99cb 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c88da963db..e7f465ef7b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -640,6 +648,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2464,3 +2524,567 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+ Relids parent_relids = NULL;
+ AppendRelInfo **appinfos;
+ int nappinfos;
+ int cnt;
+
+ appinfos = find_appinfos_by_relids(root, rel->relids, &nappinfos);
+
+ for (cnt = 0; cnt < nappinfos; cnt++)
+ parent_relids = bms_add_member(parent_relids,
+ appinfos[cnt]->parent_relid);
+
+ Assert(!bms_is_empty(parent_relids));
+ rel_grouped = find_grouped_rel(root, parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+
+ agg_info = (RelAggInfo *) adjust_appendrel_attrs(root,
+ (Node *) agg_info,
+ nappinfos,
+ appinfos);
+
+ pfree(appinfos);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether given Var appears in Aggref(s) which we consider usable at
+ * relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (bms_is_member(var->varno, ac_info->agg_eval_at))
+ break;;
+ }
+
+ /* No aggregate references the Var? */
+ if (lc == NULL)
+ return false;
+
+ /* Does the Var appear in the target outside aggregates? */
+ foreach(lc, root->processed_tlist)
+ {
+ TargetEntry *tle = lfirst_node(TargetEntry, lc);
+ List *vars;
+
+ if (IsA(tle->expr, Aggref))
+ continue;
+
+ vars = pull_var_clause((Node *) tle->expr,
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ return false;
+ }
+
+ list_free(vars);
+ }
+
+ /* The Var is in aggregate(s) and only there. */
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..d1365229f7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 69ed9eb1f6..3ef5195323 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -429,6 +429,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..8d03ce2c57 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -344,4 +348,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040a047b81..dcea10888b 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -160,7 +160,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
v2-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v2-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From 0e2fa155051357ede5f24a8ba1147e9009566572 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v2 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 9e25750acd..c88da963db 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -477,7 +477,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -497,19 +497,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -517,9 +525,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -550,7 +558,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -558,10 +566,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -576,32 +592,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1496,7 +1520,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index be51e2c652..d67f725ad6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1065,6 +1065,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
v2-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v2-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 2a9bda403417daf3773f90478d345ac68c709cca Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v2 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0672d8458f..633b5b0af1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3303,6 +3305,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 246cd8f747..dc5582adb7 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2695,8 +2695,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2948,8 +2947,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -2995,8 +2993,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3154,8 +3151,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index dcea10888b..68fc05432c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
v2-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v2-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From 80becc67b08e471cceac8426b6f6a1edf46f3891 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v2 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f8a5fbcb0a..0672d8458f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d3868b628d..db903796ec 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -79,6 +80,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -326,6 +329,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index eb78e37317..197a3f905e 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -263,6 +265,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 45013582a7..96c7852821 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -938,6 +938,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index edcc0282b2..09d851b376 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -410,6 +410,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d67f725ad6..69ed9eb1f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -383,6 +383,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3193,6 +3199,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0e8a9c94ba..040a047b81 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 9be7aca2b8..a83a41b0f8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -134,7 +135,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
v2-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v2-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From acd58b0b5077d3804a375e74d8856c6bfdd1aa1e Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v2 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 633b5b0af1..b21f21589a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e7f465ef7b..83cdbb38bc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "rewrite/rewriteManip.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -411,6 +415,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8d03ce2c57..6b856a5e77 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
v2-0009-Add-README.patchapplication/octet-stream; name=v2-0009-Add-README.patchDownload
From 19371326f53e2e7160f8930af9fba664ce7e7d95 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v2 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
v2-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v2-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 471e3b9044eb6b863de7128558fc2bf36ba1be21 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v2 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 +++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 ++++++++++
src/backend/optimizer/path/joinrels.c | 115 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 35 ++++++--
src/backend/optimizer/util/appendinfo.c | 64 +++++++++++++
5 files changed, 320 insertions(+), 26 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b21f21589a..68ae7ef47f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3861,6 +3861,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3881,6 +3885,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4749,6 +4774,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..a9ef081597 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -753,6 +758,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -864,6 +873,107 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ Relids joinrelids;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ joinrelids = bms_union(rel1->relids, rel2->relids);
+ rel_grouped = find_grouped_rel(root, joinrelids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ /* we should not see dummy grouped relation */
+ Assert(rel1_grouped == NULL || !IS_DUMMY_REL(rel1_grouped));
+ Assert(rel2_grouped == NULL || !IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_grouped == NULL &&
+ rel2_grouped == NULL)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (rel1_grouped != NULL &&
+ rel2_grouped != NULL)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (rel1_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (rel2_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1653,6 +1763,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ac97575453..8244134fcd 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3939,10 +3939,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
+
+ /*
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
+ */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
set_cheapest(partially_grouped_rel);
- }
/*
* Estimate number of groups.
@@ -7036,6 +7042,13 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7059,19 +7072,25 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
!force_rel_creation)
- return NULL;
+ return partially_grouped_rel;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..b3a284214a 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,70 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ newinfo->agg_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
--
2.31.0
v2-0008-Add-test-cases.patchapplication/octet-stream; name=v2-0008-Add-test-cases.patchDownload
From 6d8ee0fee84860efb3ba30911b6f87c1ce275686 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v2 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1270 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 205 +++
3 files changed, 1476 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..2d7dec8a5d
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1270 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(15 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 6 | 501
+ 7 | 502
+ 3 | 498
+ 4 | 499
+ 9 | 504
+ 5 | 500
+ 8 | 503
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(22 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 6 | 507
+ 7 | 509
+ 3 | 501
+ 4 | 503
+ 9 | 513
+ 5 | 505
+ 8 | 511
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ QUERY PLAN
+------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.a, avg(t3.c)
+ Group Key: t3.a
+ -> Sort
+ Output: t3.a, (PARTIAL avg(t3.c))
+ Sort Key: t3.a
+ -> Hash Left Join
+ Output: t3.a, (PARTIAL avg(t3.c))
+ Hash Cond: (t3.b = t1.b)
+ -> Partial HashAggregate
+ Output: t3.a, t3.b, PARTIAL avg(t3.c)
+ Group Key: t3.a, t3.b
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(18 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg
+---+-----
+ 0 | 505
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ QUERY PLAN
+------------------------------------------------------
+ HashAggregate
+ Output: t3.a, avg(t3.c)
+ Group Key: t3.a
+ -> Hash Right Join
+ Output: t3.a, t3.c
+ Hash Cond: (t3.b = t1.b)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(12 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg
+---+-----
+ 8 | 503
+ |
+ 9 | 504
+ 7 | 502
+ 1 | 496
+ 5 | 500
+ 4 | 499
+ 2 | 497
+ 6 | 501
+ 3 | 498
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(46 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+ x | sum | count
+----+------+-------
+ 6 | 1100 | 100
+ 0 | 500 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(46 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+ y | sum | count
+----+------+-------
+ 6 | 1100 | 100
+ 0 | 500 | 100
+ 18 | 1300 | 100
+ 12 | 700 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(41 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+ x | sum | count
+----+------+-------
+ 4 | 1200 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+ 2 | 600 | 50
+ 12 | 600 | 50
+ 8 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(67 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+ x | sum
+----+-------
+ 4 | 18000
+ 2 | 14000
+ 8 | 26000
+ 6 | 22000
+ 0 | 10000
+ 16 | 22000
+ 10 | 10000
+ 14 | 18000
+ 12 | 14000
+ 18 | 26000
+ 26 | 22000
+ 28 | 26000
+ 22 | 14000
+ 20 | 10000
+ 24 | 18000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(76 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+ x | sum | count
+----+-------+-------
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 7 | 8092 | 1156
+ 1 | 1156 | 1156
+ 5 | 5780 | 1156
+ 4 | 4624 | 1156
+ 2 | 2312 | 1156
+ 0 | 0 | 1089
+ 6 | 6936 | 1156
+ 3 | 3468 | 1156
+ 11 | 11979 | 1089
+ 13 | 14157 | 1089
+ 10 | 11560 | 1156
+ 14 | 15246 | 1089
+ 12 | 13068 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 16 | 17424 | 1089
+ 15 | 16335 | 1089
+ 19 | 20691 | 1089
+ 24 | 26136 | 1089
+ 21 | 22869 | 1089
+ 23 | 25047 | 1089
+ 22 | 23958 | 1089
+ 20 | 21780 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 25 | 27225 | 1089
+ 29 | 31581 | 1089
+ 28 | 30492 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(64 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+ y | sum | count
+----+-------+-------
+ 29 | 31581 | 1089
+ 4 | 4624 | 1156
+ 0 | 0 | 1089
+ 10 | 11560 | 1156
+ 9 | 10404 | 1156
+ 7 | 8092 | 1156
+ 15 | 16335 | 1089
+ 6 | 6936 | 1156
+ 26 | 28314 | 1089
+ 12 | 13068 | 1089
+ 24 | 26136 | 1089
+ 19 | 20691 | 1089
+ 25 | 27225 | 1089
+ 21 | 22869 | 1089
+ 14 | 15246 | 1089
+ 3 | 3468 | 1156
+ 17 | 18513 | 1089
+ 28 | 30492 | 1089
+ 22 | 23958 | 1089
+ 20 | 21780 | 1089
+ 13 | 14157 | 1089
+ 1 | 1156 | 1156
+ 5 | 5780 | 1156
+ 18 | 19602 | 1089
+ 2 | 2312 | 1156
+ 16 | 17424 | 1089
+ 27 | 29403 | 1089
+ 23 | 25047 | 1089
+ 11 | 11979 | 1089
+ 8 | 9248 | 1156
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(111 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+ x | sum | count
+----+---------+-------
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 7 | 550256 | 39304
+ 1 | 78608 | 39304
+ 5 | 393040 | 39304
+ 4 | 314432 | 39304
+ 2 | 157216 | 39304
+ 0 | 0 | 35937
+ 6 | 471648 | 39304
+ 3 | 235824 | 39304
+ 11 | 790614 | 35937
+ 13 | 934362 | 35937
+ 10 | 786080 | 39304
+ 14 | 1006236 | 35937
+ 12 | 862488 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 16 | 1149984 | 35937
+ 15 | 1078110 | 35937
+ 19 | 1365606 | 35937
+ 24 | 1724976 | 35937
+ 21 | 1509354 | 35937
+ 23 | 1653102 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 25 | 1796850 | 35937
+ 29 | 2084346 | 35937
+ 28 | 2012472 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(99 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+ y | sum | count
+----+---------+-------
+ 29 | 2084346 | 35937
+ 4 | 314432 | 39304
+ 0 | 0 | 35937
+ 10 | 786080 | 39304
+ 9 | 707472 | 39304
+ 7 | 550256 | 39304
+ 15 | 1078110 | 35937
+ 6 | 471648 | 39304
+ 26 | 1868724 | 35937
+ 12 | 862488 | 35937
+ 24 | 1724976 | 35937
+ 19 | 1365606 | 35937
+ 25 | 1796850 | 35937
+ 21 | 1509354 | 35937
+ 14 | 1006236 | 35937
+ 3 | 235824 | 39304
+ 17 | 1221858 | 35937
+ 28 | 2012472 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 13 | 934362 | 35937
+ 1 | 78608 | 39304
+ 5 | 393040 | 39304
+ 18 | 1293732 | 35937
+ 2 | 157216 | 39304
+ 16 | 1149984 | 35937
+ 27 | 1940598 | 35937
+ 23 | 1653102 | 35937
+ 11 | 790614 | 35937
+ 8 | 628864 | 39304
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..250a9dba21 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..aba2c41557
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,205 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
On Tue, Mar 5, 2024 at 2:47 PM Richard Guo <guofenglinux@gmail.com> wrote:
This needs a rebase after dbbca2cf29. I also revised the commit message
for 0007 and fixed a typo in 0009.
Here is another rebase, mainly to make the test cases more stable by
adding ORDER BY clauses to the test queries. Also fixed more typos in
passing.
Thanks
Richard
Attachments:
v3-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v3-0001-Introduce-RelInfoList-structure.patchDownload
From 44aca769b993d8d2e0882e6494fd8fd5e583b3de Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v3 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 142dcfc995..f46fc604b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0b98f0856e..f8a5fbcb0a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3410,9 +3410,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 075d36c7ec..eb78e37317 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e5f4062bfb..9e25750acd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -472,11 +476,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -484,47 +488,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -534,23 +540,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -562,6 +568,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -611,32 +665,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1462,22 +1490,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1496,7 +1516,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..a003433178 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -267,15 +286,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -408,7 +421,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
v3-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v3-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From c49857b95ac2f8fe0d988b57181d4fbe13ce1043 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v3 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 9e25750acd..c88da963db 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -477,7 +477,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -497,19 +497,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -517,9 +525,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -550,7 +558,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -558,10 +566,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -576,32 +592,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1496,7 +1520,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a003433178..ad55d76169 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1065,6 +1065,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
v3-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v3-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From 1f3d144d68f8ddf43b51ed6e0d7cc603681a4f0f Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v3 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f8a5fbcb0a..0672d8458f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d3868b628d..db903796ec 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -79,6 +80,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -326,6 +329,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index eb78e37317..197a3f905e 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -263,6 +265,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 45013582a7..96c7852821 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -938,6 +938,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index edcc0282b2..09d851b376 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -410,6 +410,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ad55d76169..42aeca880c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -383,6 +383,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3193,6 +3199,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0e8a9c94ba..040a047b81 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 9be7aca2b8..a83a41b0f8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -134,7 +135,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
v3-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v3-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From 81dc9b5e7bb29474f3fa2deb9f8fde36f9e16c00 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v3 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 624 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 662 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 4bd60a09c6..1890dbb852 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2439,15 +2439,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2462,6 +2464,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2490,8 +2503,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2532,7 +2544,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 197a3f905e..0ff0ca99cb 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c88da963db..e0b36880cd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -640,6 +648,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2464,3 +2524,567 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+ Relids parent_relids = NULL;
+ AppendRelInfo **appinfos;
+ int nappinfos;
+ int cnt;
+
+ appinfos = find_appinfos_by_relids(root, rel->relids, &nappinfos);
+
+ for (cnt = 0; cnt < nappinfos; cnt++)
+ parent_relids = bms_add_member(parent_relids,
+ appinfos[cnt]->parent_relid);
+
+ Assert(!bms_is_empty(parent_relids));
+ rel_grouped = find_grouped_rel(root, parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+
+ agg_info = (RelAggInfo *) adjust_appendrel_attrs(root,
+ (Node *) agg_info,
+ nappinfos,
+ appinfos);
+
+ pfree(appinfos);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether given Var appears in Aggref(s) which we consider usable at
+ * relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (bms_is_member(var->varno, ac_info->agg_eval_at))
+ break;
+ }
+
+ /* No aggregate references the Var? */
+ if (lc == NULL)
+ return false;
+
+ /* Does the Var appear in the target outside aggregates? */
+ foreach(lc, root->processed_tlist)
+ {
+ TargetEntry *tle = lfirst_node(TargetEntry, lc);
+ List *vars;
+
+ if (IsA(tle->expr, Aggref))
+ continue;
+
+ vars = pull_var_clause((Node *) tle->expr,
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ return false;
+ }
+
+ list_free(vars);
+ }
+
+ /* The Var is in aggregate(s) and only there. */
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..d1365229f7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 42aeca880c..adce1f6c94 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -429,6 +429,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..8d03ce2c57 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -344,4 +348,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040a047b81..dcea10888b 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -160,7 +160,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
v3-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v3-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 6ee05b8308242cfe78879ff5c8eb4d45ac363a6e Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v3 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0672d8458f..633b5b0af1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3303,6 +3305,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 246cd8f747..dc5582adb7 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2695,8 +2695,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2948,8 +2947,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -2995,8 +2993,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3154,8 +3151,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index dcea10888b..68fc05432c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
v3-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v3-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From e6812d749d802f32b5b8ccdb3c409bfc91ebf7fd Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v3 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 633b5b0af1..b21f21589a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e0b36880cd..b2b9d61c98 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "rewrite/rewriteManip.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -411,6 +415,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8d03ce2c57..6b856a5e77 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
v3-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v3-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 981e711edcc2dde89b1bb5b00d45ba00bc2d1ab6 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v3 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 +++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 ++++++++++
src/backend/optimizer/path/joinrels.c | 115 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 35 ++++++--
src/backend/optimizer/util/appendinfo.c | 64 +++++++++++++
5 files changed, 320 insertions(+), 26 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b21f21589a..68ae7ef47f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3861,6 +3861,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3881,6 +3885,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4749,6 +4774,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..a9ef081597 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -753,6 +758,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -864,6 +873,107 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ Relids joinrelids;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ joinrelids = bms_union(rel1->relids, rel2->relids);
+ rel_grouped = find_grouped_rel(root, joinrelids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ /* we should not see dummy grouped relation */
+ Assert(rel1_grouped == NULL || !IS_DUMMY_REL(rel1_grouped));
+ Assert(rel2_grouped == NULL || !IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_grouped == NULL &&
+ rel2_grouped == NULL)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (rel1_grouped != NULL &&
+ rel2_grouped != NULL)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (rel1_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (rel2_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1653,6 +1763,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index ac97575453..8244134fcd 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3939,10 +3939,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
+
+ /*
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
+ */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
set_cheapest(partially_grouped_rel);
- }
/*
* Estimate number of groups.
@@ -7036,6 +7042,13 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7059,19 +7072,25 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
!force_rel_creation)
- return NULL;
+ return partially_grouped_rel;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..b3a284214a 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,70 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ newinfo->agg_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
--
2.31.0
v3-0008-Add-test-cases.patchapplication/octet-stream; name=v3-0008-Add-test-cases.patchDownload
From d54a328d1a1f9d4a0a0ad0c0498d0b5815cb5585 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v3 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
3 files changed, 1486 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..7a28287522
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..250a9dba21 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
v3-0009-Add-README.patchapplication/octet-stream; name=v3-0009-Add-README.patchDownload
From 4fdd52c035e79908d21fe1bcea6766ae62bfdd34 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v3 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
On Tue, Mar 5, 2024 at 7:19 PM Richard Guo <guofenglinux@gmail.com> wrote:
Here is another rebase, mainly to make the test cases more stable by
adding ORDER BY clauses to the test queries. Also fixed more typos in
passing.
This needs another rebase after 97d85be365. I also addressed several
issues that I identified during self-review, which include:
* In some cases GroupPathExtraData.agg_final_costs, which is the cost of
final aggregation, fails to be calculated. This can lead to bogus cost
estimation and end up with unexpected plan.
* If the cheapest partially grouped path is generated through eager
aggregation, the number of groups estimated for the final phase will be
different from the number of groups estimated for non-split aggregation.
That is to say, we should not use 'dNumGroups' for the final aggregation
in add_paths_to_grouping_rel().
* It is possible that we may generate dummy grouped join relations, and
that would trigger the Assert in make_grouped_join_rel().
* More typo fixes.
Thanks
Richard
Attachments:
v4-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v4-0001-Introduce-RelInfoList-structure.patchDownload
From 07f968865f1ac7b0451958a31c28dc44d3726fce Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v4 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 142dcfc995..f46fc604b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0b98f0856e..f8a5fbcb0a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3410,9 +3410,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 075d36c7ec..eb78e37317 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d791c4108d..a0a94dfe3b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -472,11 +476,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -484,47 +488,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -534,23 +540,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -562,6 +568,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -611,32 +665,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1462,22 +1490,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1496,7 +1516,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..a003433178 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -267,15 +286,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -408,7 +421,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
v4-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v4-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From cd869ba7c7fcd0ec7b5acf1c717d14f84fc709fe Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v4 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index a0a94dfe3b..b0bb4ae532 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -477,7 +477,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -497,19 +497,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -517,9 +525,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -550,7 +558,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -558,10 +566,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -576,32 +592,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1496,7 +1520,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a003433178..1c0655074c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1065,6 +1065,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
v4-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v4-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From 412cab28dd97fa751318ab7c0c279832b2a27944 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v4 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f8a5fbcb0a..0672d8458f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d3868b628d..db903796ec 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -79,6 +80,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -326,6 +329,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index eb78e37317..197a3f905e 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -263,6 +265,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 1e71e7db4a..184a8b31f9 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -938,6 +938,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2244ee52f7..3f06bae40c 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -412,6 +412,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1c0655074c..d9a67ace6e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -383,6 +383,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3193,6 +3199,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index b137c8a589..e8639f07e6 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 9be7aca2b8..a83a41b0f8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -134,7 +135,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
v4-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v4-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From 05aba8b25c9c2ed4275a696b0e1f235574f6f10d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v4 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 623 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 661 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 4bd60a09c6..1890dbb852 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2439,15 +2439,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2462,6 +2464,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2490,8 +2503,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2532,7 +2544,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 197a3f905e..0ff0ca99cb 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index b0bb4ae532..79288fb2d3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -640,6 +648,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2464,3 +2524,566 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+ Relids parent_relids = NULL;
+ AppendRelInfo **appinfos;
+ int nappinfos;
+ int cnt;
+
+ appinfos = find_appinfos_by_relids(root, rel->relids, &nappinfos);
+
+ for (cnt = 0; cnt < nappinfos; cnt++)
+ parent_relids = bms_add_member(parent_relids,
+ appinfos[cnt]->parent_relid);
+
+ Assert(!bms_is_empty(parent_relids));
+ rel_grouped = find_grouped_rel(root, parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+
+ agg_info = (RelAggInfo *) adjust_appendrel_attrs(root,
+ (Node *) agg_info,
+ nappinfos,
+ appinfos);
+
+ pfree(appinfos);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether given Var appears in Aggref(s) which we consider usable at
+ * relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (bms_is_member(var->varno, ac_info->agg_eval_at))
+ break;
+ }
+
+ /* No aggregate references the Var? */
+ if (lc == NULL)
+ return false;
+
+ /* Does the Var appear in the target outside aggregates? */
+ foreach(lc, root->processed_tlist)
+ {
+ TargetEntry *tle = lfirst_node(TargetEntry, lc);
+ List *vars;
+
+ if (IsA(tle->expr, Aggref))
+ continue;
+
+ vars = pull_var_clause((Node *) tle->expr,
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ return false;
+ }
+
+ list_free(vars);
+ }
+
+ /* The Var is in aggregate(s) and only there. */
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..d1365229f7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d9a67ace6e..4b21aded5a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -429,6 +429,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 99c2f955aa..91ce637f9e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -346,4 +350,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index e8639f07e6..21fb06872d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -160,7 +160,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
v4-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v4-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 281ac06b3250eb08f5661d21d918814278c6bd00 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v4 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0672d8458f..633b5b0af1 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3303,6 +3305,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0a7e5c2678..c87990bdfc 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2707,8 +2707,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2960,8 +2959,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3007,8 +3005,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3166,8 +3163,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 21fb06872d..3f0a151289 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
v4-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v4-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From 3b0f425cd220303c66548372960b011899cff6a2 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v4 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 633b5b0af1..b21f21589a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 79288fb2d3..0b11ba15ef 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -411,6 +415,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 91ce637f9e..41818c5189 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -308,6 +308,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
v4-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v4-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 0c45ee6c9464568820d575408ab40729b68122ef Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v4 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 ++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 +++++++++
src/backend/optimizer/path/joinrels.c | 123 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 84 +++++++++++-----
src/backend/optimizer/util/appendinfo.c | 64 ++++++++++++
5 files changed, 360 insertions(+), 43 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b21f21589a..68ae7ef47f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3861,6 +3861,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3881,6 +3885,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4749,6 +4774,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..ac6533888c 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -753,6 +758,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -864,6 +873,115 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ Relids joinrelids;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ joinrelids = bms_union(rel1->relids, rel2->relids);
+ rel_grouped = find_grouped_rel(root, joinrelids, &agg_info);
+
+ bms_free(joinrelids);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /*
+ * If we've already proven this grouped join relation is empty, we needn't
+ * consider any more paths for it.
+ */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1653,6 +1771,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5564826cb4..1e45d4eb27 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -221,7 +221,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3856,9 +3855,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -3939,23 +3936,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
/*
- * Estimate number of groups.
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
*/
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6786,16 +6781,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -6907,7 +6928,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -6924,7 +6945,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -6932,7 +6953,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -6974,19 +6995,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7036,6 +7055,13 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7059,19 +7085,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..b3a284214a 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,70 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ newinfo->agg_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
--
2.31.0
v4-0008-Add-test-cases.patchapplication/octet-stream; name=v4-0008-Add-test-cases.patchDownload
From f2329176bbf64f79b0686d24720efd57f7ae3e68 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v4 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
3 files changed, 1486 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..7a28287522
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 5ac6e871f5..ffde0be70e 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
v4-0009-Add-README.patchapplication/octet-stream; name=v4-0009-Add-README.patchDownload
From d3b1d1474b6e9daef96ff74a28520b9d12d3b663 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v4 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
There is a conflict in the parallel_schedule file. So here is another
rebase. Nothing else has changed.
Thanks
Richard
Attachments:
v5-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v5-0001-Introduce-RelInfoList-structure.patchDownload
From e3a6d3a84b4a5213dd93224e13cf55b6fbec11db Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v5 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 142dcfc995..f46fc604b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cc51ae1757..ffc6edd6c7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3415,9 +3415,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 075d36c7ec..eb78e37317 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d791c4108d..a0a94dfe3b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -472,11 +476,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -484,47 +488,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -534,23 +540,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -562,6 +568,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -611,32 +665,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1462,22 +1490,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1496,7 +1516,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ab25d9ce7..25a9119802 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -413,7 +426,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
v5-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v5-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From a199ef034b0bbb5375c97d4bf0ee0fd79ca40690 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v5 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index a0a94dfe3b..b0bb4ae532 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -477,7 +477,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -497,19 +497,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -517,9 +525,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -550,7 +558,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -558,10 +566,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -576,32 +592,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1496,7 +1520,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 25a9119802..402b9b5874 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1070,6 +1070,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
v5-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v5-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From 1717905be8df182b2f1c4205828e5dbf2364c9ac Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v5 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ffc6edd6c7..586c0e07c0 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d3868b628d..db903796ec 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -79,6 +80,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -326,6 +329,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index eb78e37317..197a3f905e 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -263,6 +265,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c68fdc008b..5617a90db5 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -940,6 +940,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2166ea4a87..27b6515cd3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 402b9b5874..46b68fab9d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -386,6 +386,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3203,6 +3209,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 39ba461548..8f2bd60d47 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 9be7aca2b8..a83a41b0f8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -134,7 +135,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
v5-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v5-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From afc00dedb7067ca6a74fec0317d779edb287fc65 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v5 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 623 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 661 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index a619ff9177..98cfb68ce2 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2439,15 +2439,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2462,6 +2464,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2490,8 +2503,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2532,7 +2544,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 197a3f905e..0ff0ca99cb 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index b0bb4ae532..79288fb2d3 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -640,6 +648,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2464,3 +2524,566 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+ Relids parent_relids = NULL;
+ AppendRelInfo **appinfos;
+ int nappinfos;
+ int cnt;
+
+ appinfos = find_appinfos_by_relids(root, rel->relids, &nappinfos);
+
+ for (cnt = 0; cnt < nappinfos; cnt++)
+ parent_relids = bms_add_member(parent_relids,
+ appinfos[cnt]->parent_relid);
+
+ Assert(!bms_is_empty(parent_relids));
+ rel_grouped = find_grouped_rel(root, parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+
+ agg_info = (RelAggInfo *) adjust_appendrel_attrs(root,
+ (Node *) agg_info,
+ nappinfos,
+ appinfos);
+
+ pfree(appinfos);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether given Var appears in Aggref(s) which we consider usable at
+ * relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (bms_is_member(var->varno, ac_info->agg_eval_at))
+ break;
+ }
+
+ /* No aggregate references the Var? */
+ if (lc == NULL)
+ return false;
+
+ /* Does the Var appear in the target outside aggregates? */
+ foreach(lc, root->processed_tlist)
+ {
+ TargetEntry *tle = lfirst_node(TargetEntry, lc);
+ List *vars;
+
+ if (IsA(tle->expr, Aggref))
+ continue;
+
+ vars = pull_var_clause((Node *) tle->expr,
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ return false;
+ }
+
+ list_free(vars);
+ }
+
+ /* The Var is in aggregate(s) and only there. */
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 35f8f306ee..611c172ee4 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 46b68fab9d..9378e7d913 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -434,6 +434,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c5c4756b0f..d973bff8ff 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -313,6 +313,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -347,4 +351,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8f2bd60d47..31eed6b6a8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -162,7 +162,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
v5-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v5-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 611f3fa8a011ef1eb5ef0c272226de7036b37d4b Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v5 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 586c0e07c0..3f3dbc486e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3308,6 +3310,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3cf1dac087..70fa25a67b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2709,8 +2709,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2962,8 +2961,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3009,8 +3007,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3168,8 +3165,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 31eed6b6a8..947f814f4f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
v5-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v5-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From 73a290c8f54abdb12ada576314ac6037ef4de1ee Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v5 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3f3dbc486e..ef699ab630 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 79288fb2d3..0b11ba15ef 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -411,6 +415,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d973bff8ff..d4b4499db3 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -309,6 +309,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
v5-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v5-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 605f40ce237b1d80ef764aabf2865b7d795f1eba Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v5 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 ++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 +++++++++
src/backend/optimizer/path/joinrels.c | 123 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 84 +++++++++++-----
src/backend/optimizer/util/appendinfo.c | 64 ++++++++++++
5 files changed, 360 insertions(+), 43 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ef699ab630..0e2c984442 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3866,6 +3866,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3886,6 +3890,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4754,6 +4779,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index f3a9412d18..3cda36c72e 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +891,115 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ Relids joinrelids;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ joinrelids = bms_union(rel1->relids, rel2->relids);
+ rel_grouped = find_grouped_rel(root, joinrelids, &agg_info);
+
+ bms_free(joinrelids);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /*
+ * If we've already proven this grouped join relation is empty, we needn't
+ * consider any more paths for it.
+ */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1671,6 +1789,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5320da51a0..4a6386a09d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3913,9 +3912,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -3996,23 +3993,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
/*
- * Estimate number of groups.
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
*/
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6843,16 +6838,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -6964,7 +6985,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -6981,7 +7002,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -6989,7 +7010,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7031,19 +7052,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7093,6 +7112,13 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7116,19 +7142,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..b3a284214a 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,70 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ newinfo->agg_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
--
2.31.0
v5-0008-Add-test-cases.patchapplication/octet-stream; name=v5-0008-Add-test-cases.patchDownload
From 3e62f71cac2c10f8f940dd6f1b1a34fd9e6637ae Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v5 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
3 files changed, 1486 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..7a28287522
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 675c567617..0f6b3e78a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
v5-0009-Add-README.patchapplication/octet-stream; name=v5-0009-Add-README.patchDownload
From fadfe4f66aa2c8e1dc8d9f5a86e8fb5a25fdded6 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v5 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
Here is an update of the patchset with the following changes:
* Fix a 'Aggref found where not expected' error caused by the PVC call
in is_var_in_aggref_only. This would happen if we have Aggrefs
contained in other expressions.
* Use joinrel's relids rather than the union of the relids of its outer
and inner to search for its grouped rel. This is more correct as we
need to include OJs into consideration.
* Remove RelAggInfo.agg_exprs as it is not used anymore.
Thanks
Richard
Attachments:
v6-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v6-0001-Introduce-RelInfoList-structure.patchDownload
From 9398f129e74c9c7e9dea8b85a2166f5dfa589bc2 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v6 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 4053cd641c..bfced61422 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cc51ae1757..ffc6edd6c7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3415,9 +3415,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 075d36c7ec..eb78e37317 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e05b21c884..8279ab0e11 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1469,22 +1497,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1503,7 +1523,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b8141f141a..c696824f5c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -413,7 +426,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
v6-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v6-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From 388befb0b73fb7f1b2c6409156f322366185d3f3 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v6 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8279ab0e11..8420b8936e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -484,7 +484,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -504,19 +504,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -524,9 +532,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -557,7 +565,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -565,10 +573,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -583,32 +599,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1503,7 +1527,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c696824f5c..816c41ed8c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1074,6 +1074,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
v6-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v6-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From f92e4774e00411423f22116959431ef14e392f61 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v6 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ffc6edd6c7..586c0e07c0 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..0281336469 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -327,6 +330,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index eb78e37317..197a3f905e 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -263,6 +265,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 3fd0b14dd8..5ed01f7914 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 2166ea4a87..27b6515cd3 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 816c41ed8c..7c4ade0bef 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -386,6 +386,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3207,6 +3213,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 39ba461548..8f2bd60d47 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 2f3eb4e7f1..b6f4f6686c 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -136,6 +136,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -157,7 +158,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
v6-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v6-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From 8d16ab4f55fed03a509e5a921e7255026e7bf5fc Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v6 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 636 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 674 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 21ce1ae2e1..9369acf033 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2454,15 +2454,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2477,6 +2479,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2505,8 +2518,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2547,7 +2559,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 197a3f905e..0ff0ca99cb 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8420b8936e..c6e2d417a8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -647,6 +655,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2483,3 +2543,579 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ /*
+ * If we reached the end of the list, the Var is not referenced in
+ * aggregate expressions.
+ */
+ if (lc == NULL)
+ return false;
+
+ /*
+ * Search the targetlist to see if the Var is referenced anywhere other
+ * than in aggregate expressions.
+ */
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ foreach(lc, tlist_exprs)
+ {
+ Var *tlist_var = (Var *) lfirst(lc);
+
+ if (IsA(tlist_var, Aggref))
+ continue;
+
+ if (equal(tlist_var, var))
+ {
+ list_free(tlist_exprs);
+ return false;
+ }
+ }
+
+ list_free(tlist_exprs);
+
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..877a62a62e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 7c4ade0bef..ac639abe31 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -434,6 +434,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c5c4756b0f..d973bff8ff 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -313,6 +313,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -347,4 +351,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8f2bd60d47..31eed6b6a8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -162,7 +162,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
v6-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v6-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 65cb86a4ec954786c9d6533c13ea71ba76224372 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v6 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 586c0e07c0..3f3dbc486e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3308,6 +3310,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3cf1dac087..70fa25a67b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2709,8 +2709,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2962,8 +2961,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3009,8 +3007,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3168,8 +3165,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 31eed6b6a8..947f814f4f 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
v6-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v6-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From 667c3c7de368090dd106c3a199874c20c4639bcb Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v6 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3f3dbc486e..ef699ab630 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c6e2d417a8..b14f99a9ea 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -418,6 +422,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index d973bff8ff..d4b4499db3 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -309,6 +309,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
v6-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v6-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 552892b0b78128392d2adb6bae2d367316f07885 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v6 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 ++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 ++++++++++
src/backend/optimizer/path/joinrels.c | 122 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 84 +++++++++++-----
src/backend/optimizer/util/appendinfo.c | 60 ++++++++++++
src/backend/optimizer/util/relnode.c | 2 -
src/include/nodes/pathnodes.h | 6 --
7 files changed, 355 insertions(+), 51 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ef699ab630..0e2c984442 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3866,6 +3866,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3886,6 +3890,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4754,6 +4779,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index f3a9412d18..ba1d15e85a 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +891,114 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /*
+ * If we've already proven this grouped join relation is empty, we needn't
+ * consider any more paths for it.
+ */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1671,6 +1788,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5320da51a0..4a6386a09d 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3913,9 +3912,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -3996,23 +3993,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
/*
- * Estimate number of groups.
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
*/
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6843,16 +6838,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -6964,7 +6985,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -6981,7 +7002,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -6989,7 +7010,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7031,19 +7052,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7093,6 +7112,13 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7116,19 +7142,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..08de77d439 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index b14f99a9ea..6087a14a76 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2833,8 +2833,6 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
add_column_to_pathtarget(target, (Expr *) aggref, 0);
-
- result->agg_exprs = lappend(result->agg_exprs, aggref);
}
/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ac639abe31..2c93dc3241 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1116,9 +1116,6 @@ typedef struct RelOptInfo
* "group_clauses", "group_exprs" and "group_pathkeys" are lists of
* SortGroupClause, the corresponding grouping expressions and PathKey
* respectively.
- *
- * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
- * paths.
*/
typedef struct RelAggInfo
{
@@ -1154,9 +1151,6 @@ typedef struct RelAggInfo
List *group_exprs;
/* a list of PathKeys */
List *group_pathkeys;
-
- /* a list of Aggref nodes */
- List *agg_exprs;
} RelAggInfo;
/*
--
2.31.0
v6-0008-Add-test-cases.patchapplication/octet-stream; name=v6-0008-Add-test-cases.patchDownload
From 44cbdd2b6fadf10c4f6e50665038c693e4d59977 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v6 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
3 files changed, 1486 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..7a28287522
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 675c567617..0f6b3e78a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
v6-0009-Add-README.patchapplication/octet-stream; name=v6-0009-Add-README.patchDownload
From cfc9124cd774b5364925690d10627f86a16b080c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v6 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
Another rebase is needed after d1d286d83c. Also I realized that the
partially_grouped_rel generated by eager aggregation might be dummy,
such as in query:
select count(t2.c) from t t1 join t t2 on t1.b = t2.b where false group by
t1.a;
If somehow we choose this dummy path with a Finalize Agg Path on top of
it as the final cheapest path (a very rare case), we would encounter the
"Aggref found in non-Agg plan node" error. The v7 patch fixes this
issue.
Thanks
Richard
Attachments:
v7-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v7-0001-Introduce-RelInfoList-structure.patchDownload
From 10ad693ef379979cd6794cfc0a805d4431ada9c9 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v7 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 4053cd641c..bfced61422 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4895cee994..70e2b58d8f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3403,9 +3403,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..fd8b2b0ca3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e05b21c884..8279ab0e11 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1469,22 +1497,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1503,7 +1523,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ef296ab7..4c7c6bc7a8 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -413,7 +426,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
v7-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v7-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From dc1f62ad81396b150c00c277cad1e1d041032707 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v7 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8279ab0e11..8420b8936e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -484,7 +484,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -504,19 +504,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -524,9 +532,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -557,7 +565,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -565,10 +573,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -583,32 +599,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1503,7 +1527,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4c7c6bc7a8..9a2bf98ae2 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1074,6 +1074,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
v7-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v7-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From 16456744ff3412f18ce4024913c1c82f1d28989f Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v7 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 70e2b58d8f..d1b974367b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..0281336469 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -327,6 +330,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fd8b2b0ca3..5d2bca914b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -258,6 +260,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 46c258be28..aa7641d133 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 83d5df8e46..94ab3e6582 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 9a2bf98ae2..9e785816e6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -386,6 +386,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3208,6 +3214,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 914d9bdef5..5181220263 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..cedcd88ebf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index dbfd0c13d4..5e2b19d693 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -136,6 +136,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -156,7 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
v7-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v7-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From 1da20a17dde21b1061f94e59a127125b1230bfa7 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v7 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 636 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 674 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 21ce1ae2e1..9369acf033 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2454,15 +2454,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2477,6 +2479,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2505,8 +2518,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2547,7 +2559,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5d2bca914b..f7217d7690 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8420b8936e..c6e2d417a8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -647,6 +655,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2483,3 +2543,579 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ /*
+ * If we reached the end of the list, the Var is not referenced in
+ * aggregate expressions.
+ */
+ if (lc == NULL)
+ return false;
+
+ /*
+ * Search the targetlist to see if the Var is referenced anywhere other
+ * than in aggregate expressions.
+ */
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ foreach(lc, tlist_exprs)
+ {
+ Var *tlist_var = (Var *) lfirst(lc);
+
+ if (IsA(tlist_var, Aggref))
+ continue;
+
+ if (equal(tlist_var, var))
+ {
+ list_free(tlist_exprs);
+ return false;
+ }
+ }
+
+ list_free(tlist_exprs);
+
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..877a62a62e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 9e785816e6..1a1a1b6dfb 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -434,6 +434,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 112e7c23d4..02da68a753 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -348,4 +352,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5181220263..1068ff6953 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -159,7 +159,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
v7-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v7-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From ef1b95894276bb3225a15c801201a78067a5cf4a Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v7 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d1b974367b..0c2fae9608 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3296,6 +3298,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3491c3af1c..977c0ea4eb 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2709,8 +2709,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2962,8 +2961,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3009,8 +3007,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3168,8 +3165,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 1068ff6953..74015b4ed8 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
v7-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v7-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From e982f62ea075a908e78dad894739e1a190cc1a5f Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v7 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0c2fae9608..9219815e3d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c6e2d417a8..b14f99a9ea 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -418,6 +422,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 02da68a753..525481f296 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
v7-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v7-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 2a9dc93c243ff21c74719cc67b1723b7c6dc5880 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v7 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 ++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 ++++++++++
src/backend/optimizer/path/joinrels.c | 122 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 92 +++++++++++++-----
src/backend/optimizer/util/appendinfo.c | 60 ++++++++++++
src/backend/optimizer/util/relnode.c | 2 -
src/include/nodes/pathnodes.h | 6 --
7 files changed, 363 insertions(+), 51 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9219815e3d..359eee3486 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3854,6 +3854,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3874,6 +3878,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4742,6 +4767,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index f3a9412d18..ba1d15e85a 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +891,114 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /*
+ * If we've already proven this grouped join relation is empty, we needn't
+ * consider any more paths for it.
+ */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1671,6 +1788,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 032818423f..64e8e5bb91 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3910,9 +3909,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -3993,23 +3990,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
/*
- * Estimate number of groups.
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
*/
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6877,16 +6872,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -6998,7 +7019,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -7015,7 +7036,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7023,7 +7044,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7065,19 +7086,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7127,6 +7146,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might be
+ * created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7150,19 +7184,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..08de77d439 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index b14f99a9ea..6087a14a76 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2833,8 +2833,6 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
add_column_to_pathtarget(target, (Expr *) aggref, 0);
-
- result->agg_exprs = lappend(result->agg_exprs, aggref);
}
/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1a1a1b6dfb..2b378665ba 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1116,9 +1116,6 @@ typedef struct RelOptInfo
* "group_clauses", "group_exprs" and "group_pathkeys" are lists of
* SortGroupClause, the corresponding grouping expressions and PathKey
* respectively.
- *
- * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
- * paths.
*/
typedef struct RelAggInfo
{
@@ -1154,9 +1151,6 @@ typedef struct RelAggInfo
List *group_exprs;
/* a list of PathKeys */
List *group_pathkeys;
-
- /* a list of Aggref nodes */
- List *agg_exprs;
} RelAggInfo;
/*
--
2.31.0
v7-0008-Add-test-cases.patchapplication/octet-stream; name=v7-0008-Add-test-cases.patchDownload
From 99feec748ac9b12e8973bd10a70b38274e6817a3 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v7 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
3 files changed, 1486 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..7a28287522
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 969ced994f..06362ae1e7 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
v7-0009-Add-README.patchapplication/octet-stream; name=v7-0009-Add-README.patchDownload
From 897a2b1c162e340b8f6aea16ada68ed40e7bf727 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v7 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
On Mon, May 20, 2024 at 4:12 PM Richard Guo <guofenglinux@gmail.com> wrote:
Another rebase is needed after d1d286d83c. Also I realized that the
partially_grouped_rel generated by eager aggregation might be dummy,
such as in query:select count(t2.c) from t t1 join t t2 on t1.b = t2.b where false group by t1.a;
If somehow we choose this dummy path with a Finalize Agg Path on top of
it as the final cheapest path (a very rare case), we would encounter the
"Aggref found in non-Agg plan node" error. The v7 patch fixes this
issue.
I spent some time testing this patchset and found a few more issues.
One issue is that partially-grouped partial paths may have already been
generated in the process of building up the grouped join relations by
eager aggregation, in which case the partially_grouped_rel would contain
valid partial paths by the time we reach create_partial_grouping_paths.
If we subsequently find that parallelism is not possible for
partially_grouped_rel, we need to drop these partial paths; otherwise we
risk encountering Assert(subpath->parallel_safe) when creating gather /
gather merge path. This issue can be reproduced with the query below on
v7 patch.
create function parallel_restricted_func(a int) returns int as
$$ begin return a; end; $$ parallel restricted language plpgsql;
create table t (a int, b int, c int) with (parallel_workers = 2);
set enable_eager_aggregate to on;
explain (costs off)
select parallel_restricted_func(1) * count(t2.c)
from t t1, t t2 where t1.b = t2.b group by t2.c;
Another issue I found is that when we check to see whether a given Var
appears only within Aggrefs, we need to account for havingQual in
addition to targetlist; otherwise there's a risk of omitting this Var
from the targetlist of the partial Agg node, leading to 'ERROR: variable
not found in subplan target list'. This error can be reproduced with
the query below on v7.
create table t (a int primary key, b int, c int);
set enable_eager_aggregate to on;
explain (costs off)
select count(*) from t t1, t t2 group by t1.a having min(t1.b) < t1.b;
ERROR: variable not found in subplan target list
A third issue I found is that with v7 we might push the Partial Agg to
the nullable side of an outer join, which is not correct. This happens
because when determining whether a Partial Agg can be pushed down to a
relation, the v7 patchset indeed checks if the aggregate expressions can
be evaluated at this relation level. However, it overlooks checking the
grouping expressions. The grouping expressions can originate from two
sources: the original GROUP BY clauses, or constructed from join
conditions. In either case, we must verify that the grouping
expressions cannot be nulled by outer joins that are above the current
relation, otherwise the Partial Agg cannot be pushed down to this rel.
Hence here is the v8 patchset, with fixes for all the above issues.
Thanks
Richard
Attachments:
v8-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v8-0001-Introduce-RelInfoList-structure.patchDownload
From 2b21b38b087be75f1b698d78cf8af4fea0ca02d7 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v8 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0bb9a5ae8f..e82e1bb558 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4895cee994..70e2b58d8f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3403,9 +3403,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..fd8b2b0ca3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e05b21c884..8279ab0e11 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1469,22 +1497,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1503,7 +1523,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c117..0805de64d5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -413,7 +426,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.43.0
v8-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v8-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From 8c32d3172dfb61d1486938ae1679d3bf0765db22 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:01:26 +0900
Subject: [PATCH v8 2/9] Introduce RelAggInfo structure to store info for
grouped paths
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8279ab0e11..8420b8936e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -484,7 +484,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -504,19 +504,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -524,9 +532,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -557,7 +565,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -565,10 +573,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -583,32 +599,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1503,7 +1527,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0805de64d5..18d1ae8cbc 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1078,6 +1078,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.43.0
v8-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v8-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From f683049af871631c15a587af6c329a381e5d30fb Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:03:00 +0900
Subject: [PATCH v8 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 70e2b58d8f..d1b974367b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..0281336469 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -327,6 +330,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fd8b2b0ca3..5d2bca914b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -258,6 +260,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 46c258be28..aa7641d133 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e0567de219..961e8c3f92 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 18d1ae8cbc..683ab51e6b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -386,6 +386,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3219,6 +3225,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c0224a..d8199333c9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..cedcd88ebf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index dbfd0c13d4..5e2b19d693 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -136,6 +136,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -156,7 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.43.0
v8-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v8-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From 5a2f11dac0bfa301d9bb85a25ade7ce8ce543024 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:04:41 +0900
Subject: [PATCH v8 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/initsplan.c | 24 +-
src/backend/optimizer/plan/planmain.c | 4 +
src/backend/optimizer/util/relnode.c | 647 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 11 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
8 files changed, 704 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 51d806326e..d871396e20 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2443,15 +2443,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2466,6 +2468,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2494,8 +2507,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2536,7 +2548,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 0281336469..4f30afd615 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -381,8 +381,8 @@ setup_eager_aggregation(PlannerInfo *root)
return;
/*
- * Collect aggregate expressions that appear in targetlist and having
- * clauses.
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and having clauses.
*/
create_agg_clause_infos(root);
@@ -400,10 +400,9 @@ setup_eager_aggregation(PlannerInfo *root)
}
/*
- * Create AggClauseInfo for each aggregate.
- *
- * If any aggregate is not suitable, set root->agg_clause_list to NIL and
- * return.
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
*/
static void
create_agg_clause_infos(PlannerInfo *root)
@@ -412,6 +411,7 @@ create_agg_clause_infos(PlannerInfo *root)
ListCell *lc;
Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
PVC_INCLUDE_AGGREGATES |
@@ -455,10 +455,13 @@ create_agg_clause_infos(PlannerInfo *root)
AggClauseInfo *ac_info;
/*
- * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ * collect plain Vars for future reference
*/
if (IsA(expr, Var))
+ {
+ root->tlist_vars = list_append_unique(root->tlist_vars, expr);
continue;
+ }
aggref = castNode(Aggref, expr);
@@ -477,10 +480,11 @@ create_agg_clause_infos(PlannerInfo *root)
}
/*
- * Create GroupExprInfo for each expression usable as grouping key.
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
*
- * If any grouping expression is not suitable, set root->group_expr_list to NIL
- * and return.
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
*/
static void
create_grouping_expr_infos(PlannerInfo *root)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5d2bca914b..ece6936e23 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -79,6 +82,7 @@ query_planner(PlannerInfo *root,
root->placeholder_array_size = 0;
root->agg_clause_list = NIL;
root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8420b8936e..27f779d778 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,15 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel,
+ bool *safe_to_push);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -647,6 +656,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2483,3 +2544,589 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ /*
+ * Check if all grouping expressions that are appliable to this relation
+ * can be evaluated on this relation level.
+ */
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+ Var *ge_var = castNode(Var, ge_info->expr);
+
+ /*
+ * Not interested if the grouping expression is not appliable to this
+ * relation.
+ */
+ if (!bms_is_member(ge_var->varno, rel->relids))
+ continue;
+
+ /*
+ * Give up if any grouping expression can be nulled by an outer join
+ * above this relation.
+ */
+ if (!bms_is_subset(ge_var->varnullingrels, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those Vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ bool safe_to_push;
+
+ if (is_var_needed_by_join(root, (Var *) expr, rel, &safe_to_push))
+ {
+ /*
+ * Give up if this expression is not safe to be used as a
+ * grouping key at this relation level.
+ */
+ if (!safe_to_push)
+ return false;
+
+ /*
+ * The expression is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have already been added to the targets
+ * above.) We need to construct a special SortGroupClause for
+ * this expression.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of this
+ * SortGroupClause until we're done with the iteration of
+ * rel->reltarget->exprs. And it makes sense for the caller to
+ * do some more checks before it starts to create those
+ * SortGroupClauses.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist and havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel. We also
+ * return in '*safe_to_push' whether it's safe to use this Var as a grouping
+ * key at this rel level.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel,
+ bool *safe_to_push)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ /*
+ * If the baserel this Var belongs to can be nulled by outer joins that are
+ * above the current rel, then it is not safe to use this Var as a grouping
+ * key at current rel level.
+ */
+ *safe_to_push = bms_is_subset(baserel->nulling_relids, rel->relids);
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..877a62a62e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 683ab51e6b..fd10498028 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -389,9 +389,12 @@ struct PlannerInfo
/* list of AggClauseInfos */
List *agg_clause_list;
- /* List of GroupExprInfos */
+ /* list of GroupExprInfos */
List *group_expr_list;
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -434,6 +437,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 112e7c23d4..02da68a753 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -348,4 +352,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index d8199333c9..ae7a8ed742 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -159,7 +159,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.43.0
v8-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v8-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 8a2fe74cc296423061cc84991c33cbeb9b48ade5 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:05:50 +0900
Subject: [PATCH v8 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d1b974367b..0c2fae9608 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3296,6 +3298,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3491c3af1c..977c0ea4eb 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2709,8 +2709,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2962,8 +2961,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3009,8 +3007,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3168,8 +3165,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index ae7a8ed742..413c269091 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.43.0
v8-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v8-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From 1e81f2f29be7cabb2d125574731ad4074caa2227 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:07:32 +0900
Subject: [PATCH v8 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0c2fae9608..9219815e3d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 27f779d778..f8f0c0fc69 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -419,6 +423,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 02da68a753..525481f296 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.43.0
v8-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v8-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 10a5d6b4c1484160d5ee6a1e6959f0fe34eeab23 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:08:23 +0900
Subject: [PATCH v8 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 +++++++++++----
src/backend/optimizer/path/allpaths.c | 48 +++++++++
src/backend/optimizer/path/joinrels.c | 136 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 100 ++++++++++++-----
src/backend/optimizer/util/appendinfo.c | 60 +++++++++++
src/backend/optimizer/util/relnode.c | 2 -
src/include/nodes/pathnodes.h | 6 --
7 files changed, 385 insertions(+), 51 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9219815e3d..359eee3486 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3854,6 +3854,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3874,6 +3878,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4742,6 +4767,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index db475e25b1..78a88c9d3b 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +891,128 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /*
+ * If we've already proven this grouped join relation is empty, we needn't
+ * consider any more paths for it.
+ */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1671,6 +1802,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4711f91239..b69efb3cd1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3999,9 +3998,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4082,23 +4079,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
/*
- * Estimate number of groups.
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
*/
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6967,16 +6962,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7088,7 +7109,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -7105,7 +7126,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7113,7 +7134,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7155,19 +7176,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7217,6 +7236,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might be
+ * created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7240,19 +7274,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7261,6 +7303,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..08de77d439 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f8f0c0fc69..91013e1a80 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2834,8 +2834,6 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
add_column_to_pathtarget(target, (Expr *) aggref, 0);
-
- result->agg_exprs = lappend(result->agg_exprs, aggref);
}
/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index fd10498028..4ce70f256d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1123,9 +1123,6 @@ typedef struct RelOptInfo
* "group_clauses", "group_exprs" and "group_pathkeys" are lists of
* SortGroupClause, the corresponding grouping expressions and PathKey
* respectively.
- *
- * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
- * paths.
*/
typedef struct RelAggInfo
{
@@ -1161,9 +1158,6 @@ typedef struct RelAggInfo
List *group_exprs;
/* a list of PathKeys */
List *group_pathkeys;
-
- /* a list of Aggref nodes */
- List *agg_exprs;
} RelAggInfo;
/*
--
2.43.0
v8-0008-Add-test-cases.patchapplication/octet-stream; name=v8-0008-Add-test-cases.patchDownload
From f613f9fe6acc471c869bd45bae61e15e4dfb2d91 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:16:15 +0900
Subject: [PATCH v8 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
3 files changed, 1486 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..7a28287522
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 969ced994f..06362ae1e7 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.43.0
v8-0009-Add-README.patchapplication/octet-stream; name=v8-0009-Add-README.patchDownload
From 44b6b774cb8b0eafc01067c3405cea093e91acdc Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:17:27 +0900
Subject: [PATCH v8 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.43.0
On Thu, Jun 13, 2024 at 4:07 PM Richard Guo <guofenglinux@gmail.com> wrote:
I spent some time testing this patchset and found a few more issues.
...
Hence here is the v8 patchset, with fixes for all the above issues.
I found an 'ORDER/GROUP BY expression not found in targetlist' error
with this patchset, with the query below:
create table t (a boolean);
set enable_eager_aggregate to on;
explain (costs off)
select min(1) from t t1 left join t t2 on t1.a group by (not (not
t1.a)), t1.a order by t1.a;
ERROR: ORDER/GROUP BY expression not found in targetlist
This happens because the two grouping items are actually the same and
standard_qp_callback would remove one of them. The fully-processed
groupClause is kept in root->processed_groupClause. However, when
collecting grouping expressions in create_grouping_expr_infos, we are
checking parse->groupClause, which is incorrect.
The fix is straightforward: check root->processed_groupClause instead.
Here is a new rebase with this fix.
Thanks
Richard
Attachments:
v9-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v9-0001-Introduce-RelInfoList-structure.patchDownload
From af0a498a243684478c2b08d9cb1dcf2d5a979a93 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v9 01/10] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 0bb9a5ae8f..e82e1bb558 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4895cee994..70e2b58d8f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3403,9 +3403,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..fd8b2b0ca3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e05b21c884..8279ab0e11 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1469,22 +1497,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1503,7 +1523,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 2ba297c117..0805de64d5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -413,7 +426,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.43.0
v9-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchapplication/octet-stream; name=v9-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patchDownload
From 06e29a2206817a810baa4f9155f2d0732885a0f9 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:01:26 +0900
Subject: [PATCH v9 02/10] Introduce RelAggInfo structure to store info for
grouped paths
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8279ab0e11..8420b8936e 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -484,7 +484,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -504,19 +504,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -524,9 +532,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -557,7 +565,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -565,10 +573,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -583,32 +599,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1503,7 +1527,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0805de64d5..18d1ae8cbc 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1078,6 +1078,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.43.0
v9-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchapplication/octet-stream; name=v9-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patchDownload
From 5fc05389ba521c3ee73edda770fab39756f298be Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:03:00 +0900
Subject: [PATCH v9 03/10] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 70e2b58d8f..d1b974367b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -77,6 +77,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..4e51213410 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -327,6 +330,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fd8b2b0ca3..5d2bca914b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -77,6 +77,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -258,6 +260,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d28b0bcb40..d3b37af81a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -931,6 +931,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9ec9f97e92..3f03ef0438 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 18d1ae8cbc..683ab51e6b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -386,6 +386,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3219,6 +3225,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5e88c0224a..d8199333c9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..cedcd88ebf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 729620de13..46d6645bd8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -136,6 +136,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -156,7 +157,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
--
2.43.0
v9-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchapplication/octet-stream; name=v9-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patchDownload
From 60437198177719782d13c60cd0309406045e707b Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:04:41 +0900
Subject: [PATCH v9 04/10] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/initsplan.c | 24 +-
src/backend/optimizer/plan/planmain.c | 4 +
src/backend/optimizer/util/relnode.c | 647 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 11 +-
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
8 files changed, 704 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 51d806326e..d871396e20 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2443,15 +2443,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2466,6 +2468,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2494,8 +2507,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2536,7 +2548,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 4e51213410..9f05edfbac 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -381,8 +381,8 @@ setup_eager_aggregation(PlannerInfo *root)
return;
/*
- * Collect aggregate expressions that appear in targetlist and having
- * clauses.
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and having clauses.
*/
create_agg_clause_infos(root);
@@ -400,10 +400,9 @@ setup_eager_aggregation(PlannerInfo *root)
}
/*
- * Create AggClauseInfo for each aggregate.
- *
- * If any aggregate is not suitable, set root->agg_clause_list to NIL and
- * return.
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
*/
static void
create_agg_clause_infos(PlannerInfo *root)
@@ -412,6 +411,7 @@ create_agg_clause_infos(PlannerInfo *root)
ListCell *lc;
Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
PVC_INCLUDE_AGGREGATES |
@@ -455,10 +455,13 @@ create_agg_clause_infos(PlannerInfo *root)
AggClauseInfo *ac_info;
/*
- * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ * collect plain Vars for future reference
*/
if (IsA(expr, Var))
+ {
+ root->tlist_vars = list_append_unique(root->tlist_vars, expr);
continue;
+ }
aggref = castNode(Aggref, expr);
@@ -477,10 +480,11 @@ create_agg_clause_infos(PlannerInfo *root)
}
/*
- * Create GroupExprInfo for each expression usable as grouping key.
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
*
- * If any grouping expression is not suitable, set root->group_expr_list to NIL
- * and return.
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
*/
static void
create_grouping_expr_infos(PlannerInfo *root)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5d2bca914b..ece6936e23 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -79,6 +82,7 @@ query_planner(PlannerInfo *root,
root->placeholder_array_size = 0;
root->agg_clause_list = NIL;
root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 8420b8936e..27f779d778 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,15 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel,
+ bool *safe_to_push);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -647,6 +656,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2483,3 +2544,589 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ /*
+ * Check if all grouping expressions that are appliable to this relation
+ * can be evaluated on this relation level.
+ */
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+ Var *ge_var = castNode(Var, ge_info->expr);
+
+ /*
+ * Not interested if the grouping expression is not appliable to this
+ * relation.
+ */
+ if (!bms_is_member(ge_var->varno, rel->relids))
+ continue;
+
+ /*
+ * Give up if any grouping expression can be nulled by an outer join
+ * above this relation.
+ */
+ if (!bms_is_subset(ge_var->varnullingrels, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those Vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ bool safe_to_push;
+
+ if (is_var_needed_by_join(root, (Var *) expr, rel, &safe_to_push))
+ {
+ /*
+ * Give up if this expression is not safe to be used as a
+ * grouping key at this relation level.
+ */
+ if (!safe_to_push)
+ return false;
+
+ /*
+ * The expression is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have already been added to the targets
+ * above.) We need to construct a special SortGroupClause for
+ * this expression.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of this
+ * SortGroupClause until we're done with the iteration of
+ * rel->reltarget->exprs. And it makes sense for the caller to
+ * do some more checks before it starts to create those
+ * SortGroupClauses.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist and havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel. We also
+ * return in '*safe_to_push' whether it's safe to use this Var as a grouping
+ * key at this rel level.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel,
+ bool *safe_to_push)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ /*
+ * If the baserel this Var belongs to can be nulled by outer joins that are
+ * above the current rel, then it is not safe to use this Var as a grouping
+ * key at current rel level.
+ */
+ *safe_to_push = bms_is_subset(baserel->nulling_relids, rel->relids);
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 5f5d7959d8..877a62a62e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 683ab51e6b..fd10498028 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -389,9 +389,12 @@ struct PlannerInfo
/* list of AggClauseInfos */
List *agg_clause_list;
- /* List of GroupExprInfos */
+ /* list of GroupExprInfos */
List *group_expr_list;
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -434,6 +437,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 112e7c23d4..02da68a753 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -348,4 +352,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index d8199333c9..ae7a8ed742 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -159,7 +159,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.43.0
v9-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchapplication/octet-stream; name=v9-0005-Implement-functions-that-generate-paths-for-grouped-relations.patchDownload
From 0df60b1a5567eee57b7437677d10d77c7060ba44 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:05:50 +0900
Subject: [PATCH v9 05/10] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d1b974367b..0c2fae9608 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3296,6 +3298,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index c42742d2c7..8de9825f80 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2709,8 +2709,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2962,8 +2961,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3009,8 +3007,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3168,8 +3165,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index ae7a8ed742..413c269091 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.43.0
v9-0006-Build-grouped-relations-out-of-base-relations.patchapplication/octet-stream; name=v9-0006-Build-grouped-relations-out-of-base-relations.patchDownload
From 397840b64b5d1847d0dbb62990c002badbb9c202 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:07:32 +0900
Subject: [PATCH v9 06/10] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0c2fae9608..9219815e3d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 27f779d778..f8f0c0fc69 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -419,6 +423,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 02da68a753..525481f296 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.43.0
v9-0007-Build-grouped-relations-out-of-join-relations.patchapplication/octet-stream; name=v9-0007-Build-grouped-relations-out-of-join-relations.patchDownload
From 3635321462aee998a96d6a4c7a52fa71b0548335 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:08:23 +0900
Subject: [PATCH v9 07/10] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
The changes made to make_join_rel() are relatively minor, with the
addition of a new function make_grouped_join_rel(), which finds or
creates a grouped relation for the just-processed joinrel, and generates
grouped paths by joining a grouped input relation with a non-grouped
input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see
generate_grouped_paths()).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer joins, the aggregate cannot be pushed down if any column
referenced by grouping expressions or aggregate functions is nullable by
an outer join above the relation to which we want to apply the partial
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 +++++++++++----
src/backend/optimizer/path/allpaths.c | 48 +++++++++
src/backend/optimizer/path/joinrels.c | 136 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 100 ++++++++++++-----
src/backend/optimizer/util/appendinfo.c | 60 +++++++++++
src/backend/optimizer/util/relnode.c | 2 -
src/include/nodes/pathnodes.h | 6 --
7 files changed, 385 insertions(+), 51 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9219815e3d..359eee3486 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3854,6 +3854,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3874,6 +3878,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4742,6 +4767,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index db475e25b1..78a88c9d3b 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +891,128 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /*
+ * If we've already proven this grouped join relation is empty, we needn't
+ * consider any more paths for it.
+ */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1671,6 +1802,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4711f91239..b69efb3cd1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3999,9 +3998,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4082,23 +4079,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
/*
- * Estimate number of groups.
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
*/
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6967,16 +6962,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7088,7 +7109,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -7105,7 +7126,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7113,7 +7134,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7155,19 +7176,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7217,6 +7236,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might be
+ * created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7240,19 +7274,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7261,6 +7303,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 6ba4eba224..08de77d439 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f8f0c0fc69..91013e1a80 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2834,8 +2834,6 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
add_column_to_pathtarget(target, (Expr *) aggref, 0);
-
- result->agg_exprs = lappend(result->agg_exprs, aggref);
}
/*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index fd10498028..4ce70f256d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1123,9 +1123,6 @@ typedef struct RelOptInfo
* "group_clauses", "group_exprs" and "group_pathkeys" are lists of
* SortGroupClause, the corresponding grouping expressions and PathKey
* respectively.
- *
- * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
- * paths.
*/
typedef struct RelAggInfo
{
@@ -1161,9 +1158,6 @@ typedef struct RelAggInfo
List *group_exprs;
/* a list of PathKeys */
List *group_pathkeys;
-
- /* a list of Aggref nodes */
- List *agg_exprs;
} RelAggInfo;
/*
--
2.43.0
v9-0008-Add-test-cases.patchapplication/octet-stream; name=v9-0008-Add-test-cases.patchDownload
From 029be0fd9d784bc21677e48375ebf57bab37ae55 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:16:15 +0900
Subject: [PATCH v9 08/10] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
3 files changed, 1486 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..7a28287522
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2429ec2bba..d5697e5655 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.43.0
v9-0009-Add-README.patchapplication/octet-stream; name=v9-0009-Add-README.patchDownload
From 03382212e277487b28108b394a0a41e386db732d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:17:27 +0900
Subject: [PATCH v9 09/10] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..dae7b87f32 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partial aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.43.0
v9-0010-Run-pgindent.patchapplication/octet-stream; name=v9-0010-Run-pgindent.patchDownload
From 3f807e0601224654085f947bab5d4def3bde4fdf Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 3 Jul 2024 16:24:39 +0900
Subject: [PATCH v9 10/10] Run pgindent
---
src/backend/optimizer/geqo/geqo_eval.c | 19 ++++---
src/backend/optimizer/path/allpaths.c | 74 ++++++++++++-------------
src/backend/optimizer/path/joinrels.c | 20 ++++---
src/backend/optimizer/plan/initsplan.c | 24 ++++----
src/backend/optimizer/plan/planner.c | 8 +--
src/backend/optimizer/util/appendinfo.c | 2 +-
src/backend/optimizer/util/relnode.c | 69 +++++++++++------------
src/include/nodes/pathnodes.h | 10 ++--
src/tools/pgindent/typedefs.list | 4 ++
9 files changed, 119 insertions(+), 111 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 278857d767..2851bed282 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -87,8 +87,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* already contain some entries. The newly added entries will be recycled
* by the MemoryContextDelete below, so we must ensure that each list of
* the RelInfoList structures is restored to its former state before
- * exiting. We can do this by truncating each list to its original length.
- * NOTE this assumes that any added entries are appended at the end!
+ * exiting. We can do this by truncating each list to its original
+ * length. NOTE this assumes that any added entries are appended at the
+ * end!
*
* We also must take care not to mess up the outer hash tables of the
* RelInfoList structures, if any. We can do this by just temporarily
@@ -136,8 +137,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
/*
* Restore each of the list in join_rel_list, agg_info_list and
- * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
- * original hashtable if any.
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put
+ * back original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
savelength_join_rel);
@@ -308,14 +309,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/*
* Except for the topmost scan/join rel, consider generating
- * partial aggregation paths for the grouped relation on top of the
- * paths of this rel. After that, we're done creating paths for
- * the grouped relation, so run set_cheapest().
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
*/
if (!bms_equal(joinrel->relids, root->all_query_rels))
{
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
rel_grouped = find_grouped_rel(root, joinrel->relids,
&agg_info);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 359eee3486..3602dcacfa 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -360,19 +360,19 @@ setup_base_grouped_rels(PlannerInfo *root)
for (rti = 1; rti < root->simple_rel_array_size; rti++)
{
- RelOptInfo *rel = root->simple_rel_array[rti];
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info;
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
/* there may be empty slots corresponding to non-baserel RTEs */
if (rel == NULL)
continue;
- Assert(rel->relid == rti); /* sanity check on array */
+ Assert(rel->relid == rti); /* sanity check on array */
/*
- * Ignore RTEs that are not simple rels. Note that we need to consider
- * "other rels" here.
+ * Ignore RTEs that are not simple rels. Note that we need to
+ * consider "other rels" here.
*/
if (!IS_SIMPLE_REL(rel))
continue;
@@ -1366,8 +1366,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
static void
set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
{
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
/* Add paths to the grouped base relation if one exists. */
rel_grouped = find_grouped_rel(root, rel->relids,
@@ -3419,8 +3419,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
/*
- * Determine whether it's possible to perform sort-based implementations of
- * grouping.
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
*/
can_sort = grouping_is_sortable(agg_info->group_clauses);
@@ -3481,9 +3481,9 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
int presorted_keys;
/*
- * Since the path originates from the non-grouped relation which is
- * not aware of eager aggregation, we must ensure that it provides
- * the correct input for the partial aggregation.
+ * Since the path originates from the non-grouped relation which
+ * is not aware of eager aggregation, we must ensure that it
+ * provides the correct input for the partial aggregation.
*/
path = (Path *) create_projection_path(root,
rel_grouped,
@@ -3527,8 +3527,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
}
/*
- * qual is NIL because the HAVING clause cannot be evaluated until the
- * final value of the aggregate is known.
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
*/
path = (Path *) create_agg_path(root,
rel_grouped,
@@ -3558,9 +3558,9 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
int presorted_keys;
/*
- * Since the path originates from the non-grouped relation which is
- * not aware of eager aggregation, we must ensure that it provides
- * the correct input for the partial aggregation.
+ * Since the path originates from the non-grouped relation which
+ * is not aware of eager aggregation, we must ensure that it
+ * provides the correct input for the partial aggregation.
*/
path = (Path *) create_projection_path(root,
rel_grouped,
@@ -3605,8 +3605,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
}
/*
- * qual is NIL because the HAVING clause cannot be evaluated until the
- * final value of the aggregate is known.
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
*/
path = (Path *) create_agg_path(root,
rel_grouped,
@@ -3628,12 +3628,12 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
*/
if (can_hash && cheapest_total_path != NULL)
{
- Path *path;
+ Path *path;
/*
* Since the path originates from the non-grouped relation which is
- * not aware of eager aggregation, we must ensure that it provides
- * the correct input for the partial aggregation.
+ * not aware of eager aggregation, we must ensure that it provides the
+ * correct input for the partial aggregation.
*/
path = (Path *) create_projection_path(root,
rel_grouped,
@@ -3641,8 +3641,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
agg_info->agg_input);
/*
- * qual is NIL because the HAVING clause cannot be evaluated until
- * the final value of the aggregate is known.
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
*/
path = (Path *) create_agg_path(root,
rel_grouped,
@@ -3663,12 +3663,12 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
*/
if (can_hash && cheapest_partial_path != NULL)
{
- Path *path;
+ Path *path;
/*
* Since the path originates from the non-grouped relation which is
- * not aware of eager aggregation, we must ensure that it provides
- * the correct input for the partial aggregation.
+ * not aware of eager aggregation, we must ensure that it provides the
+ * correct input for the partial aggregation.
*/
path = (Path *) create_projection_path(root,
rel_grouped,
@@ -3676,8 +3676,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
agg_info->agg_input);
/*
- * qual is NIL because the HAVING clause cannot be evaluated until
- * the final value of the aggregate is known.
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
*/
path = (Path *) create_agg_path(root,
rel_grouped,
@@ -3880,14 +3880,14 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/*
* Except for the topmost scan/join rel, consider generating
- * partial aggregation paths for the grouped relation on top of the
- * paths of this rel. After that, we're done creating paths for
- * the grouped relation, so run set_cheapest().
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
*/
if (!bms_equal(rel->relids, root->all_query_rels))
{
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
rel_grouped = find_grouped_rel(root, rel->relids,
&agg_info);
@@ -4777,8 +4777,8 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
rel->top_parent_relids : rel->relids,
root->all_query_rels))
{
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
rel_grouped = find_grouped_rel(root, child_rel->relids,
&agg_info);
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 78a88c9d3b..23bbef15f0 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -905,12 +905,12 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist)
{
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info = NULL;
- RelOptInfo *rel1_grouped;
- RelOptInfo *rel2_grouped;
- bool rel1_empty;
- bool rel2_empty;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
/*
* If there are no aggregate expressions or grouping expressions, eager
@@ -975,9 +975,9 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
/*
* Join of two grouped relations is currently not supported. In such a
* case, grouping of one side would change the occurrence of the other
- * side's aggregate transient states on the input of the final aggregation.
- * This can be handled by adjusting the transient states, but it's not
- * worth the effort for now.
+ * side's aggregate transient states on the input of the final
+ * aggregation. This can be handled by adjusting the transient states, but
+ * it's not worth the effort for now.
*/
if (!rel1_empty && !rel2_empty)
return;
@@ -989,6 +989,7 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
sjinfo, restrictlist);
populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
sjinfo, restrictlist);
+
/*
* It shouldn't happen that we have marked rel1_grouped as dummy in
* populate_joinrel_with_paths due to provably constant-false join
@@ -1003,6 +1004,7 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
sjinfo, restrictlist);
populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
sjinfo, restrictlist);
+
/*
* It shouldn't happen that we have marked rel2_grouped as dummy in
* populate_joinrel_with_paths due to provably constant-false join
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 9f05edfbac..f093ef0f13 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -430,9 +430,9 @@ create_agg_clause_infos(PlannerInfo *root)
}
/*
- * Aggregates within the HAVING clause need to be processed in the same way
- * as those in the targetlist. Note that HAVING can contain Aggrefs but
- * not WindowFuncs.
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
*/
if (root->parse->havingQual != NULL)
{
@@ -504,10 +504,10 @@ create_grouping_expr_infos(PlannerInfo *root)
SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
TypeCacheEntry *tce;
- Oid equalimageproc;
- Oid eq_op;
- List *eq_opfamilies;
- Oid btree_opfamily;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
Assert(tle->ressortgroupref > 0);
@@ -518,11 +518,11 @@ create_grouping_expr_infos(PlannerInfo *root)
return;
/*
- * Eager aggregation is only possible if equality of grouping keys
- * per the equality operator implies bitwise equality. Otherwise, if
- * we put keys of different byte images into the same group, we lose
- * some information that may be needed to evaluate join clauses above
- * the pushed-down aggregate node, or the WHERE clause.
+ * Eager aggregation is only possible if equality of grouping keys per
+ * the equality operator implies bitwise equality. Otherwise, if we
+ * put keys of different byte images into the same group, we lose some
+ * information that may be needed to evaluate join clauses above the
+ * pushed-down aggregate node, or the WHERE clause.
*
* For example, the NUMERIC data type is not supported because values
* that fall into the same group according to the equality operator
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b69efb3cd1..72a45f1b01 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4084,8 +4084,8 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/*
* Now choose the best path(s) for partially_grouped_rel.
*
- * Note that the non-partial paths can come either from the Gather above or
- * from eager aggregation.
+ * Note that the non-partial paths can come either from the Gather above
+ * or from eager aggregation.
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
set_cheapest(partially_grouped_rel);
@@ -7245,8 +7245,8 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* It is possible that the partially_grouped_rel created by eager
- * aggregation is dummy. In this case we just set it to NULL. It might be
- * created again by the following logic if possible.
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
*/
if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
partially_grouped_rel = NULL;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 08de77d439..27ac853c0a 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -542,7 +542,7 @@ adjust_appendrel_attrs_mutator(Node *node,
if (oldtarget->sortgrouprefs)
{
- Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
newtarget->exprs = (List *)
adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 91013e1a80..55ff082b9b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -47,7 +47,7 @@ typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
void *data;
-} RelInfoEntry;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -435,13 +435,13 @@ RelOptInfo *
build_simple_grouped_rel(PlannerInfo *root, int relid,
RelAggInfo **agg_info_p)
{
- RelOptInfo *rel_plain;
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info;
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
/*
- * We should have available aggregate expressions and grouping expressions,
- * otherwise we cannot reach here.
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
*/
Assert(root->agg_clause_list != NIL);
Assert(root->group_expr_list != NIL);
@@ -481,7 +481,7 @@ build_simple_grouped_rel(PlannerInfo *root, int relid,
RelOptInfo *
build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
{
- RelOptInfo *rel_grouped;
+ RelOptInfo *rel_grouped;
rel_grouped = makeNode(RelOptInfo);
memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
@@ -2672,13 +2672,13 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
/*
* If this is a child rel, the grouped rel for its parent rel must have
- * been created if it can. So we can just use parent's RelAggInfo if there
- * is one, with appropriate variable substitutions.
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
*/
if (IS_OTHER_REL(rel))
{
- RelOptInfo *rel_grouped;
- RelAggInfo *agg_info;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
Assert(!bms_is_empty(rel->top_parent_relids));
rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info);
@@ -2761,8 +2761,8 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
* Initialize the SortGroupClause.
*
* As the final aggregation will not use this grouping expression,
- * we don't care whether sortop is < or >. The value of nulls_first
- * should not matter for the same reason.
+ * we don't care whether sortop is < or >. The value of
+ * nulls_first should not matter for the same reason.
*/
cl->tleSortGroupRef = ++sortgroupref;
get_sort_group_operators(var->vartype,
@@ -2826,7 +2826,7 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
foreach(lc, root->agg_clause_list)
{
AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
- Aggref *aggref;
+ Aggref *aggref;
Assert(IsA(ac_info->aggref, Aggref));
@@ -2848,8 +2848,8 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
result->agg_input = agg_input;
/*
- * The number of aggregation input rows is simply the number of rows of the
- * non-grouped relation, which should have been estimated by now.
+ * The number of aggregation input rows is simply the number of rows of
+ * the non-grouped relation, which should have been estimated by now.
*/
result->input_rows = rel->rows;
@@ -2920,7 +2920,8 @@ eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
Assert(IsA(ac_info->aggref, Aggref));
/*
- * Give up if any aggregate needs relations other than the current one.
+ * Give up if any aggregate needs relations other than the current
+ * one.
*
* If the aggregate needs the current rel plus anything else, then the
* problem is that grouping of the current relation could make some
@@ -3012,7 +3013,7 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
}
else
{
- bool safe_to_push;
+ bool safe_to_push;
if (is_var_needed_by_join(root, (Var *) expr, rel, &safe_to_push))
{
@@ -3024,17 +3025,17 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
return false;
/*
- * The expression is needed for a join, however it's neither in
- * the GROUP BY clause nor can it be derived from it using EC.
- * (Otherwise it would have already been added to the targets
- * above.) We need to construct a special SortGroupClause for
- * this expression.
+ * The expression is needed for a join, however it's neither
+ * in the GROUP BY clause nor can it be derived from it using
+ * EC. (Otherwise it would have already been added to the
+ * targets above.) We need to construct a special
+ * SortGroupClause for this expression.
*
* Note that its tleSortGroupRef needs to be unique within
* agg_input, so we need to postpone creation of this
* SortGroupClause until we're done with the iteration of
- * rel->reltarget->exprs. And it makes sense for the caller to
- * do some more checks before it starts to create those
+ * rel->reltarget->exprs. And it makes sense for the caller
+ * to do some more checks before it starts to create those
* SortGroupClauses.
*/
*group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
@@ -3045,9 +3046,9 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
* Another reason we might need this variable is that some
* aggregate pushed down to this relation references it. In
* such a case, add it to "agg_input", but not to "target".
- * However, if the aggregate is not the only reason for the var
- * to be in the target, some more checks need to be performed
- * below.
+ * However, if the aggregate is not the only reason for the
+ * var to be in the target, some more checks need to be
+ * performed below.
*/
add_new_column_to_pathtarget(agg_input, expr);
}
@@ -3101,8 +3102,8 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
* expression but not referenced by any join.
*
* If the eager aggregation will support generic grouping
- * expression in the future, create_rel_agg_info() will have to add
- * this variable to "agg_input" target and also add the whole
+ * expression in the future, create_rel_agg_info() will have to
+ * add this variable to "agg_input" target and also add the whole
* generic expression to "target".
*/
return false;
@@ -3128,7 +3129,7 @@ is_var_in_aggref_only(PlannerInfo *root, Var *var)
foreach(lc, root->agg_clause_list)
{
AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
- List *vars;
+ List *vars;
Assert(IsA(ac_info->aggref, Aggref));
@@ -3188,9 +3189,9 @@ is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel,
attno = var->varattno - baserel->min_attr;
/*
- * If the baserel this Var belongs to can be nulled by outer joins that are
- * above the current rel, then it is not safe to use this Var as a grouping
- * key at current rel level.
+ * If the baserel this Var belongs to can be nulled by outer joins that
+ * are above the current rel, then it is not safe to use this Var as a
+ * grouping key at current rel level.
*/
*safe_to_push = bms_is_subset(baserel->nulling_relids, rel->relids);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4ce70f256d..e32d96769c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -291,7 +291,7 @@ struct PlannerInfo
* join_rel_list is a list of all join-relation RelOptInfos we have
* considered in this planning run.
*/
- RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -435,7 +435,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/*
* list of grouped relation RelAggInfos. One instance of RelAggInfo per
@@ -1147,10 +1147,10 @@ typedef struct RelAggInfo
struct PathTarget *agg_input;
/* estimated number of input tuples for the grouped paths */
- Cardinality input_rows;
+ Cardinality input_rows;
- /* estimated number of result tuples of the grouped relation*/
- Cardinality grouped_rows;
+ /* estimated number of result tuples of the grouped relation */
+ Cardinality grouped_rows;
/* a list of SortGroupClause's */
List *group_clauses;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6c1caf649..4019a5fee9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1058,6 +1059,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -2364,12 +2366,14 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
RelIdCacheEnt
RelInfo
RelInfoArr
+RelInfoList
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
Richard:
Thanks for reviving this patch and for all of your work on it! Eager
aggregation pushdown will be beneficial for my work and I'm hoping to see
it land.
I was playing around with v9 of the patches and was specifically curious
about this previous statement...
This patch also makes eager aggregation work with outer joins. With
outer join, the aggregate cannot be pushed down if any column referenced
by grouping expressions or aggregate functions is nullable by an outer
join above the relation to which we want to apply the partiall
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
...and this related comment in eager_aggregate.out:
-- Ensure aggregation cannot be pushed down to the nullable side
While I'm new to this work and its subtleties, I'm wondering if this is too
broad a condition.
I modified the first test query in eager_aggregate.sql to make it a LEFT
JOIN and eager aggregation indeed did not happen, which is expected based
on the comments upthread.
query:
SET enable_eager_aggregate=ON;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON
t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
plan:
QUERY PLAN
------------------------------------------------------------
GroupAggregate
Output: t1.a, sum(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, t2.c
Sort Key: t1.a
-> Hash Right Join
Output: t1.a, t2.c
Hash Cond: (t2.b = t1.b)
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
-> Hash
Output: t1.a, t1.b
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b
(15 rows)
(NOTE: I changed the aggregate from avg(...) to sum(...) for simplicity)
But, it seems that eager aggregation for the query above can be
"replicated" as:
query:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, sum(t2.c)
FROM eager_agg_t1 t1
LEFT JOIN (
SELECT b, sum(c) c
FROM eager_agg_t2 t2p
GROUP BY b
) t2 ON t1.b = t2.b
GROUP BY t1.a
ORDER BY t1.a;
The output of both the original query and this one match (and the plans
with eager aggregation and the subquery are nearly identical if you restore
the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does
this mean that there are conditions under which eager aggregation can be
pushed down to the nullable side?
-Paul-
On Sat, Jul 6, 2024 at 4:56 PM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
On Thu, Jun 13, 2024 at 4:07 PM Richard Guo <guofenglinux@gmail.com>
wrote:I spent some time testing this patchset and found a few more issues.
...Hence here is the v8 patchset, with fixes for all the above issues.
I found an 'ORDER/GROUP BY expression not found in targetlist' error
with this patchset, with the query below:create table t (a boolean);
set enable_eager_aggregate to on;
explain (costs off)
select min(1) from t t1 left join t t2 on t1.a group by (not (not
t1.a)), t1.a order by t1.a;
ERROR: ORDER/GROUP BY expression not found in targetlistThis happens because the two grouping items are actually the same and
standard_qp_callback would remove one of them. The fully-processed
groupClause is kept in root->processed_groupClause. However, when
collecting grouping expressions in create_grouping_expr_infos, we are
checking parse->groupClause, which is incorrect.The fix is straightforward: check root->processed_groupClause instead.
Here is a new rebase with this fix.
Thanks
Richard
On Sun, Jul 7, 2024 at 10:45 AM Paul George <p.a.george19@gmail.com> wrote:
Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land.
Thanks for looking at this patch!
The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side?
I think it's a very risky thing to push a partial aggregation down to
the nullable side of an outer join, because the NULL-extended rows
produced by the outer join would not be available when we perform the
partial aggregation, while with a non-eager-aggregation plan these
rows are available for the top-level aggregation. This may put the
rows into groups in a different way than expected, or get wrong values
from the aggregate functions. I've managed to compose an example:
create table t (a int, b int);
insert into t select 1, 1;
select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
a | count
---+-------
| 1
(1 row)
This is the expected result, because after the outer join we have got
a NULL-extended row.
But if we somehow push down the partial aggregation to the nullable
side of this outer join, we would get a wrong result.
explain (costs off)
select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
QUERY PLAN
-------------------------------------------
Finalize HashAggregate
Group Key: t2.a
-> Nested Loop Left Join
Filter: (t2.a IS NULL)
-> Seq Scan on t t1
-> Materialize
-> Partial HashAggregate
Group Key: t2.a
-> Seq Scan on t t2
Filter: (b > 1)
(10 rows)
select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
a | count
---+-------
| 0
(1 row)
I believe there are cases where pushing a partial aggregation down to
the nullable side of an outer join can be safe, but I doubt that there
is an easy way to identify these cases and do the push-down for them.
So for now I think we'd better refrain from doing that.
Thanks
Richard
Hey Richard,
Looking more closely at this example
select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by t2.a
having t2.a is null;
I wonder if the inability to exploit eager aggregation is more based on the
fact that COUNT(*) cannot be decomposed into an aggregation of PARTIAL
COUNT(*)s (apologies if my terminology is off/made up...I'm new to the
codebase). In other words, is it the case that a given aggregate function
already has built-in protection against the error case you correctly
pointed out?
To highlight this, in the simple example below we don't see aggregate
pushdown even with an INNER JOIN when the agg function is COUNT(*) but we
do when it's COUNT(t2.*):
-- same setup
drop table if exists t;
create table t(a int, b int, c int);
insert into t select i % 100, i % 10, i from generate_series(1, 1000) i;
analyze t;
-- query 1: COUNT(*) --> no pushdown
set enable_eager_aggregate=on;
explain (verbose, costs off) select t1.a, count(*) from t t1 join t t2 on
t1.a=t2.a group by t1.a;
QUERY PLAN
-------------------------------------------
HashAggregate
Output: t1.a, count(*)
Group Key: t1.a
-> Hash Join
Output: t1.a
Hash Cond: (t1.a = t2.a)
-> Seq Scan on public.t t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.a
-> Seq Scan on public.t t2
Output: t2.a
(12 rows)
-- query 2: COUNT(t2.*) --> agg pushdown
set enable_eager_aggregate=on;
explain (verbose, costs off) select t1.a, count(t2.*) from t t1 join t t2
on t1.a=t2.a group by t1.a;
QUERY PLAN
-------------------------------------------------------
Finalize HashAggregate
Output: t1.a, count(t2.*)
Group Key: t1.a
-> Hash Join
Output: t1.a, (PARTIAL count(t2.*))
Hash Cond: (t1.a = t2.a)
-> Seq Scan on public.t t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.a, (PARTIAL count(t2.*))
-> Partial HashAggregate
Output: t2.a, PARTIAL count(t2.*)
Group Key: t2.a
-> Seq Scan on public.t t2
Output: t2.*, t2.a
(15 rows)
...while it might be true that COUNT(*) ... INNER JOIN should allow eager
agg pushdown (I haven't thought deeply about it, TBH), I did find this
result pretty interesting.
-Paul
On Wed, Jul 10, 2024 at 1:27 AM Richard Guo <guofenglinux@gmail.com> wrote:
Show quoted text
On Sun, Jul 7, 2024 at 10:45 AM Paul George <p.a.george19@gmail.com>
wrote:Thanks for reviving this patch and for all of your work on it! Eager
aggregation pushdown will be beneficial for my work and I'm hoping to see
it land.Thanks for looking at this patch!
The output of both the original query and this one match (and the plans
with eager aggregation and the subquery are nearly identical if you restore
the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does
this mean that there are conditions under which eager aggregation can be
pushed down to the nullable side?I think it's a very risky thing to push a partial aggregation down to
the nullable side of an outer join, because the NULL-extended rows
produced by the outer join would not be available when we perform the
partial aggregation, while with a non-eager-aggregation plan these
rows are available for the top-level aggregation. This may put the
rows into groups in a different way than expected, or get wrong values
from the aggregate functions. I've managed to compose an example:create table t (a int, b int);
insert into t select 1, 1;select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
a | count
---+-------
| 1
(1 row)This is the expected result, because after the outer join we have got
a NULL-extended row.But if we somehow push down the partial aggregation to the nullable
side of this outer join, we would get a wrong result.explain (costs off)
select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
QUERY PLAN
-------------------------------------------
Finalize HashAggregate
Group Key: t2.a
-> Nested Loop Left Join
Filter: (t2.a IS NULL)
-> Seq Scan on t t1
-> Materialize
-> Partial HashAggregate
Group Key: t2.a
-> Seq Scan on t t2
Filter: (b > 1)
(10 rows)select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
a | count
---+-------
| 0
(1 row)I believe there are cases where pushing a partial aggregation down to
the nullable side of an outer join can be safe, but I doubt that there
is an easy way to identify these cases and do the push-down for them.
So for now I think we'd better refrain from doing that.Thanks
Richard
I had a self-review of this patchset and made some refactoring,
especially to the function that creates the RelAggInfo structure for a
given relation. While there were no major changes, the code should
now be simpler.
Attached is the updated version of the patchset. Previously, the
patchset was not well-split, which made it time-consuming to
distribute the changes across the patches during the refactoring. So
I squashed them into two patches to save effort.
Thanks
Richard
Attachments:
v10-0002-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v10-0002-Implement-Eager-Aggregation.patchDownload
From aaac4e4c4bcd1d259a95ca7b99288fefa3dd832d Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:01:26 +0900
Subject: [PATCH v10 2/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
A plan with eager aggregation looks like:
EXPLAIN (COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i = b.j
GROUP BY a.i;
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
and store it in root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths during this phase.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
does not seem to be very useful and is currently not supported.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys. This ensures that we have the correct input for the upper joins
and that an aggregated row from the partial aggregation matches the
other side of the join if and only if each row in the partial group
does, which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final path will compete in the
usual way with paths built from regular planning.
Since eager aggregation can generate many upper relations of partial
aggregation, we introduce a RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table
for faster lookups not only for join relations but also for upper
relations.
Eager aggregation can use significantly more CPU time and memory than
regular planning when the query involves aggregates and many joining
relations. However, in some cases, the resulting plan can be much
better, justifying the additional planning effort. All the same, for
now, turn this feature off by default.
---
src/backend/optimizer/README | 79 +
src/backend/optimizer/geqo/geqo_eval.c | 104 +-
src/backend/optimizer/path/allpaths.c | 441 ++++++
src/backend/optimizer/path/joinrels.c | 135 ++
src/backend/optimizer/plan/initsplan.c | 252 ++++
src/backend/optimizer/plan/planmain.c | 12 +
src/backend/optimizer/plan/planner.c | 100 +-
src/backend/optimizer/util/appendinfo.c | 60 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 770 +++++++++-
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 100 ++
src/include/optimizer/pathnode.h | 9 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
src/tools/pgindent/typedefs.list | 4 +
21 files changed, 3488 insertions(+), 97 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..6f79ef531e 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,82 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially pushes
+aggregation past a join, and finalizes it once all the relations are joined.
+Eager aggregation may reduce the number of input rows to the join and thus
+could result in a better overall plan.
+
+For example:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+If the partial aggregation on table B significantly reduces the number of
+input rows, the join above will be much cheaper, leading to a more efficient
+final plan.
+
+For the partial aggregation that is pushed down to a non-aggregated relation,
+we need to consider all expressions from this relation that are involved in
+upper join clauses and include them in the grouping keys. This ensures that we
+have the correct input for the upper joins and that an aggregated row from the
+partial aggregation matches the other side of the join if and only if each row
+in the partial group does, which is crucial for maintaining correctness.
+
+One restriction is that we cannot push partial aggregation down to a relation
+that is in the nullable side of an outer join, because the NULL-extended rows
+produced by the outer join would not be available when we perform the partial
+aggregation, while with a non-eager-aggregation plan these rows are available
+for the top-level aggregation. Pushing partial aggregation in this case may
+result in the rows being grouped differently than expected, or produce
+incorrect values from the aggregate functions.
+
+We can also apply eager aggregation to a join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+During the construction of the join tree, we evaluate each base or join
+relation to determine if eager aggregation can be applied. If feasible, we
+create a separate RelOptInfo called a "grouped relation" and generate grouped
+paths by adding sorted and hashed partial aggregation paths on top of the
+non-grouped paths. To limit planning time, we consider only the cheapest
+non-grouped paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation with a
+non-grouped relation. Joining two grouped relations does not seem to be very
+useful and is currently not supported.
+
+If we have generated a grouped relation for the topmost join relation, we need
+to finalize its paths at the end. The final path will compete in the usual way
+with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..b77805d27d 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -39,10 +39,20 @@ typedef struct
int size; /* number of input relations in clump */
} Clump;
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
+
static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
int num_gene, bool force);
static bool desirable_join(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list);
+static void restore_relinfolist(RelInfoList *relinfo_list,
+ RelInfoListInfo *info);
/*
@@ -60,8 +70,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ RelInfoListInfo save_join_rel;
+ RelInfoListInfo save_grouped_rel;
+ RelInfoListInfo save_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +89,33 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original
+ * length. NOTE this assumes that any added entries are appended at the
+ * end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ save_join_rel = save_relinfolist(root->join_rel_list);
+ save_grouped_rel =
+ save_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG]);
+ save_grouped_info = save_relinfolist(root->agg_info_list);
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +137,14 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put
+ * back original hashtable if any.
*/
- root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ restore_relinfolist(root->join_rel_list, &save_join_rel);
+ restore_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ &save_grouped_rel);
+ restore_relinfolist(root->agg_info_list, &save_grouped_info);
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +300,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
@@ -336,3 +378,27 @@ desirable_join(PlannerInfo *root,
/* Otherwise postpone the join till later. */
return false;
}
+
+/*
+ * Save the original length and hashtable of a RelInfoList.
+ */
+static RelInfoListInfo
+save_relinfolist(RelInfoList *relinfo_list)
+{
+ RelInfoListInfo info;
+
+ info.savelength = list_length(relinfo_list->items);
+ info.savehash = relinfo_list->hash;
+
+ return info;
+}
+
+/*
+ * Restore the original length and hashtable of a RelInfoList.
+ */
+static void
+restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info)
+{
+ relinfo_list->items = list_truncate(relinfo_list->items, info->savelength);
+ relinfo_list->hash = info->savehash;
+}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b550e707a4..03795a0ec4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,53 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +616,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1294,6 +1360,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3302,6 +3390,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3462,6 +3855,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3482,6 +3879,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4350,6 +4768,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7db5e30eef..e1a2d3b414 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +891,127 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Joining two grouped relations is currently not supported. Grouping one
+ * side would alter the occurrence of the other side's aggregate transient
+ * states in the final aggregation input. While this issue could be
+ * addressed by adjusting the transient states, it is not deemed
+ * worthwhile for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* Generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1674,6 +1804,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..2ca035dd80 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -327,6 +330,255 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * collect plain Vars for future reference
+ */
+ if (IsA(expr, Var))
+ {
+ root->tlist_vars = list_append_unique(root->tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys, as
+ * defined by the equality operator, implies bitwise equality.
+ * Otherwise, if we put keys with different byte images into the same
+ * group, we may lose some information that could be needed to
+ * evaluate upper qual clauses.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fd8b2b0ca3..ece6936e23 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -77,6 +80,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -258,6 +264,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 948afd9094..b403a46d53 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3999,9 +3998,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4082,23 +4079,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
/*
- * Estimate number of groups.
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above
+ * or from eager aggregation.
*/
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6966,16 +6961,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7087,7 +7108,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -7104,7 +7125,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7112,7 +7133,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7154,19 +7175,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7216,6 +7235,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7239,19 +7273,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7260,6 +7302,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 4989722637..4884d9ddea 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 54e042a8a5..3cb450b376 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2702,8 +2702,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2955,8 +2954,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3002,8 +3000,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3161,8 +3158,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 76e13971f7..eec678b93c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,22 +28,25 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -87,6 +91,15 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p,
+ Index *maxSortGroupRef);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -410,6 +423,101 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = find_base_rel(root, relid);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -484,7 +592,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation-specific entries.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -504,19 +612,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation-specific entries */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -524,9 +640,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find a RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -557,7 +673,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -565,10 +681,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -583,44 +707,46 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
- * add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
- * hashtable if there is one.
+ * find_grouped_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for grouped relations.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
*/
-static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
{
- /* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ RelOptInfo *rel;
- /* store it into the auxiliary hashtable if there is one. */
- if (list->hash)
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
{
- RelInfoEntry *hentry;
- bool found;
+ if (agg_info_p)
+ *agg_info_p = NULL;
- hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->rel = rel;
+ return NULL;
}
-}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- add_rel_info(root->join_rel_list, joinrel);
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
}
/*
@@ -672,6 +798,64 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
+/*
+ * add_rel_info
+ * Add relation-specific entry to a list, and also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, void *data)
+{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, data);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+ Relids relids;
+
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &relids,
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->data = data;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
+/*
+ * add_grouped_rel
+ * Add given grouped relation to the list of grouped relations in the
+ * given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1491,7 +1675,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
@@ -2528,3 +2712,503 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ Index maxSortGroupRef;
+ List *grp_exprs_extra = NIL;
+ List *eager_group_clauses;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &grp_exprs_extra, &maxSortGroupRef))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (maxSortGroupRef == 0 &&
+ list_length(grp_exprs_extra) == 0)
+ return NULL;
+
+ /*
+ * With the current max SortGroupRef within agg_input determined, we can
+ * now add the expressions that are needed by upper joins to the grouping
+ * clauses and the targets.
+ */
+ eager_group_clauses = list_copy(root->processed_groupClause);
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ cl->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+
+ eager_group_clauses = lappend(eager_group_clauses, cl);
+
+ /* This Var should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in eager_group_clauses */
+ cl = get_sortgroupref_clause(sortgroupref, eager_group_clauses);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current
+ * one.
+ *
+ * If the aggregate needs the current rel plus anything else, grouping
+ * the current rel could make some input variables unavailable for the
+ * higher aggregate and also reduce the number of input rows it
+ * receives.
+ *
+ * If the aggregate does not need the current rel at all, then the
+ * current rel should not be grouped, as we do not support joining two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * *group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClauses. Those Vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * *maxSortGroupRef receives the max SortGroupRef within agg_input.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p,
+ Index *maxSortGroupRef)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ *maxSortGroupRef = 0;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /* Update the max SortGroupRef */
+ if (sortgroupref > *maxSortGroupRef)
+ *maxSortGroupRef = sortgroupref;
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of this
+ * SortGroupClause until we're done with the iteration of
+ * rel->reltarget->exprs.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final output. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping key in
+ * grouped paths for base or join relations, or 0 otherwise.
+ *
+ * We first check if 'expr' is among the grouping expressions. If it is not,
+ * we then check if 'expr' is known equal to any of the grouping expressions
+ * due to equivalence relationships.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as a grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 79ecaa4c4c..d3d86a108a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..2e9df56cf4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1951ae7c11..815c14c71d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -387,6 +387,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -429,6 +438,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped-relation RelAggInfos, with one instance per item of the
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
@@ -1079,6 +1094,56 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3147,6 +3212,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f00bd55f39..d5282e916b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,10 +310,18 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -349,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 970499c469..9392a27a4d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..cedcd88ebf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..03ff11f8e0
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1293 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..1dda69e7c2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -150,6 +150,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -170,7 +171,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2429ec2bba..d5697e5655 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5255160212..347c82fe1a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1060,6 +1061,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -2370,6 +2372,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
@@ -2378,6 +2381,7 @@ RelInfo
RelInfoArr
RelInfoEntry
RelInfoList
+RelInfoListInfo
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
v10-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v10-0001-Introduce-RelInfoList-structure.patchDownload
From 4ef89693cc376a8d16b40bf403712b5ad171471c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v10 1/2] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 32 +++--
src/tools/pgindent/typedefs.list | 3 +-
7 files changed, 136 insertions(+), 98 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e21..be4038f64f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6079,7 +6079,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 057b4b79eb..b550e707a4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3410,9 +3410,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..fd8b2b0ca3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..76e13971f7 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1457,22 +1485,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1491,7 +1511,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1ac1..1951ae7c11 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,26 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * A structure consisting of a list and a hash table to store relation-specific
+ * information.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +290,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -413,7 +427,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 547d14b3e7..5255160212 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1291,7 +1291,6 @@ Join
JoinCostWorkspace
JoinDomain
JoinExpr
-JoinHashEntry
JoinPath
JoinPathExtraData
JoinState
@@ -2377,6 +2376,8 @@ RelFileNumber
RelIdCacheEnt
RelInfo
RelInfoArr
+RelInfoEntry
+RelInfoList
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
I had a self-review of this patchset and made some refactoring,
especially to the function that creates the RelAggInfo structure for a
given relation. While there were no major changes, the code should
now be simpler.
I found a bug in v10 patchset: when we generate the GROUP BY clauses
for the partial aggregation that is pushed down to a non-aggregated
relation, we may produce a clause with a tleSortGroupRef that
duplicates one already present in the query's groupClause, which would
cause problems.
Attached is the updated version of the patchset that fixes this bug
and includes further code refactoring.
Thanks
Richard
Attachments:
v11-0001-Introduce-RelInfoList-structure.patchapplication/octet-stream; name=v11-0001-Introduce-RelInfoList-structure.patchDownload
From 7ebfb64cd28e0ada3cc62065bfe8e9cdf685b497 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v11 1/2] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 32 +++--
src/tools/pgindent/typedefs.list | 3 +-
7 files changed, 136 insertions(+), 98 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fc65d81e21..be4038f64f 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6079,7 +6079,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 057b4b79eb..b550e707a4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3410,9 +3410,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..fd8b2b0ca3 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..76e13971f7 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1457,22 +1485,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1491,7 +1511,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 14ccfc1ac1..1951ae7c11 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,26 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * A structure consisting of a list and a hash table to store relation-specific
+ * information.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +290,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -413,7 +427,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6d424c8918..502c748ecd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1292,7 +1292,6 @@ Join
JoinCostWorkspace
JoinDomain
JoinExpr
-JoinHashEntry
JoinPath
JoinPathExtraData
JoinState
@@ -2378,6 +2377,8 @@ RelFileNumber
RelIdCacheEnt
RelInfo
RelInfoArr
+RelInfoEntry
+RelInfoList
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
v11-0002-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v11-0002-Implement-Eager-Aggregation.patchDownload
From 2af51976b33edfbe7d3c28d84c270366718e4a06 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 16:01:26 +0900
Subject: [PATCH v11 2/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
A plan with eager aggregation looks like:
EXPLAIN (COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i = b.j
GROUP BY a.i;
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
and store it in root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths during this phase.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
does not seem to be very useful and is currently not supported.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys. This ensures that we have the correct input for the upper joins
and that an aggregated row from the partial aggregation matches the
other side of the join if and only if each row in the partial group
does, which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final path will compete in the
usual way with paths built from regular planning.
Since eager aggregation can generate many upper relations of partial
aggregation, we introduce a RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table
for faster lookups not only for join relations but also for upper
relations.
Eager aggregation can use significantly more CPU time and memory than
regular planning when the query involves aggregates and many joining
relations. However, in some cases, the resulting plan can be much
better, justifying the additional planning effort. All the same, for
now, turn this feature off by default.
---
src/backend/optimizer/README | 79 +
src/backend/optimizer/geqo/geqo_eval.c | 104 +-
src/backend/optimizer/path/allpaths.c | 441 ++++++
src/backend/optimizer/path/joinrels.c | 135 ++
src/backend/optimizer/plan/initsplan.c | 252 ++++
src/backend/optimizer/plan/planmain.c | 12 +
src/backend/optimizer/plan/planner.c | 99 +-
src/backend/optimizer/util/appendinfo.c | 60 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 737 +++++++++-
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 100 ++
src/include/optimizer/pathnode.h | 9 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
src/tools/pgindent/typedefs.list | 4 +
21 files changed, 3467 insertions(+), 99 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..6f79ef531e 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,82 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially pushes
+aggregation past a join, and finalizes it once all the relations are joined.
+Eager aggregation may reduce the number of input rows to the join and thus
+could result in a better overall plan.
+
+For example:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+If the partial aggregation on table B significantly reduces the number of
+input rows, the join above will be much cheaper, leading to a more efficient
+final plan.
+
+For the partial aggregation that is pushed down to a non-aggregated relation,
+we need to consider all expressions from this relation that are involved in
+upper join clauses and include them in the grouping keys. This ensures that we
+have the correct input for the upper joins and that an aggregated row from the
+partial aggregation matches the other side of the join if and only if each row
+in the partial group does, which is crucial for maintaining correctness.
+
+One restriction is that we cannot push partial aggregation down to a relation
+that is in the nullable side of an outer join, because the NULL-extended rows
+produced by the outer join would not be available when we perform the partial
+aggregation, while with a non-eager-aggregation plan these rows are available
+for the top-level aggregation. Pushing partial aggregation in this case may
+result in the rows being grouped differently than expected, or produce
+incorrect values from the aggregate functions.
+
+We can also apply eager aggregation to a join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+During the construction of the join tree, we evaluate each base or join
+relation to determine if eager aggregation can be applied. If feasible, we
+create a separate RelOptInfo called a "grouped relation" and generate grouped
+paths by adding sorted and hashed partial aggregation paths on top of the
+non-grouped paths. To limit planning time, we consider only the cheapest
+non-grouped paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation with a
+non-grouped relation. Joining two grouped relations does not seem to be very
+useful and is currently not supported.
+
+If we have generated a grouped relation for the topmost join relation, we need
+to finalize its paths at the end. The final path will compete in the usual way
+with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..b77805d27d 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -39,10 +39,20 @@ typedef struct
int size; /* number of input relations in clump */
} Clump;
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
+
static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
int num_gene, bool force);
static bool desirable_join(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list);
+static void restore_relinfolist(RelInfoList *relinfo_list,
+ RelInfoListInfo *info);
/*
@@ -60,8 +70,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ RelInfoListInfo save_join_rel;
+ RelInfoListInfo save_grouped_rel;
+ RelInfoListInfo save_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +89,33 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original
+ * length. NOTE this assumes that any added entries are appended at the
+ * end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ save_join_rel = save_relinfolist(root->join_rel_list);
+ save_grouped_rel =
+ save_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG]);
+ save_grouped_info = save_relinfolist(root->agg_info_list);
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +137,14 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put
+ * back original hashtable if any.
*/
- root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ restore_relinfolist(root->join_rel_list, &save_join_rel);
+ restore_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ &save_grouped_rel);
+ restore_relinfolist(root->agg_info_list, &save_grouped_info);
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +300,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
@@ -336,3 +378,27 @@ desirable_join(PlannerInfo *root,
/* Otherwise postpone the join till later. */
return false;
}
+
+/*
+ * Save the original length and hashtable of a RelInfoList.
+ */
+static RelInfoListInfo
+save_relinfolist(RelInfoList *relinfo_list)
+{
+ RelInfoListInfo info;
+
+ info.savelength = list_length(relinfo_list->items);
+ info.savehash = relinfo_list->hash;
+
+ return info;
+}
+
+/*
+ * Restore the original length and hashtable of a RelInfoList.
+ */
+static void
+restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info)
+{
+ relinfo_list->items = list_truncate(relinfo_list->items, info->savelength);
+ relinfo_list->hash = info->savehash;
+}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b550e707a4..03795a0ec4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,53 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +616,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1294,6 +1360,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3302,6 +3390,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3462,6 +3855,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3482,6 +3879,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4350,6 +4768,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7db5e30eef..e1a2d3b414 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +891,127 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Joining two grouped relations is currently not supported. Grouping one
+ * side would alter the occurrence of the other side's aggregate transient
+ * states in the final aggregation input. While this issue could be
+ * addressed by adjusting the transient states, it is not deemed
+ * worthwhile for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* Generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1674,6 +1804,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..2ca035dd80 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -327,6 +330,255 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * collect plain Vars for future reference
+ */
+ if (IsA(expr, Var))
+ {
+ root->tlist_vars = list_append_unique(root->tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys, as
+ * defined by the equality operator, implies bitwise equality.
+ * Otherwise, if we put keys with different byte images into the same
+ * group, we may lose some information that could be needed to
+ * evaluate upper qual clauses.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index fd8b2b0ca3..ece6936e23 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -67,6 +67,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -77,6 +80,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -258,6 +264,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 948afd9094..89a8f39031 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3999,9 +3998,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4082,23 +4079,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6966,16 +6956,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7087,7 +7103,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -7104,7 +7120,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7112,7 +7128,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7154,19 +7170,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7216,6 +7230,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7239,19 +7268,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7260,6 +7297,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 4989722637..4884d9ddea 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 54e042a8a5..3cb450b376 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2702,8 +2702,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2955,8 +2954,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3002,8 +3000,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3161,8 +3158,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 76e13971f7..29806a3965 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,22 +28,25 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -87,6 +91,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -410,6 +422,101 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = find_base_rel(root, relid);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -484,7 +591,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation-specific entries.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -504,19 +611,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation-specific entries */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -524,9 +639,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find a RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -557,7 +672,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -565,10 +680,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -583,44 +706,46 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
- * add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
- * hashtable if there is one.
+ * find_grouped_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for grouped relations.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
*/
-static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
{
- /* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ RelOptInfo *rel;
- /* store it into the auxiliary hashtable if there is one. */
- if (list->hash)
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
{
- RelInfoEntry *hentry;
- bool found;
+ if (agg_info_p)
+ *agg_info_p = NULL;
- hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->rel = rel;
+ return NULL;
}
-}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- add_rel_info(root->join_rel_list, joinrel);
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
}
/*
@@ -672,6 +797,64 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
+/*
+ * add_rel_info
+ * Add relation-specific entry to a list, and also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, void *data)
+{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, data);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+ Relids relids;
+
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &relids,
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->data = data;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
+/*
+ * add_grouped_rel
+ * Add given grouped relation to the list of grouped relations in the
+ * given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1491,7 +1674,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
@@ -2528,3 +2711,471 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current
+ * one.
+ *
+ * If the aggregate needs the current rel plus anything else, grouping
+ * the current rel could make some input variables unavailable for the
+ * higher aggregate and also reduce the number of input rows it
+ * receives.
+ *
+ * If the aggregate does not need the current rel at all, then the
+ * current rel should not be grouped, as we do not support joining two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression can be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ */
+ SortGroupClause *sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators((castNode(Var, expr))->vartype,
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final output. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping key in
+ * grouped paths for base or join relations, or 0 otherwise.
+ *
+ * We first check if 'expr' is among the grouping expressions. If it is not,
+ * we then check if 'expr' is known equal to any of the grouping expressions
+ * due to equivalence relationships.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as a grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index af227b1f24..2796448056 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..2e9df56cf4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1951ae7c11..815c14c71d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -387,6 +387,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -429,6 +438,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ /*
+ * list of grouped-relation RelAggInfos, with one instance per item of the
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
@@ -1079,6 +1094,56 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3147,6 +3212,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index f00bd55f39..d5282e916b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,10 +310,18 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -349,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 970499c469..9392a27a4d 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..cedcd88ebf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..9f63472eff
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1308 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Incremental Sort
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Sort Key: t2_3.x, t3_3.y
+ Presorted Key: t2_3.x
+ -> Merge Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Merge Cond: (t2_3.x = t3_3.x)
+ -> Sort
+ Output: t2_3.y, t2_3.x
+ Sort Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Sort
+ Output: t3_3.y, t3_3.x
+ Sort Key: t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t2_5.x, t3_5.y, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..1dda69e7c2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -150,6 +150,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -170,7 +171,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 2429ec2bba..d5697e5655 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 502c748ecd..7eec1281e6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1061,6 +1062,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -2371,6 +2373,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
@@ -2379,6 +2382,7 @@ RelInfo
RelInfoArr
RelInfoEntry
RelInfoList
+RelInfoListInfo
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
On Wed, Aug 21, 2024 at 3:11 AM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is the updated version of the patchset that fixes this bug
and includes further code refactoring.
Here are some initial, high-level thoughts about this patch set.
1. As far as I can see, there's no real performance testing on this
thread. I expect that it's possible to show an arbitrarily large gain
for the patch by finding a case where partial aggregation is way
better than anything we currently know, but that's not very
interesting. What I think would be useful to do is find a corpus of
existing queries on an existing data set and try them with and without
the patch and see which query plans change and whether they're
actually better. For example, maybe TPC-H or the subset of TPC-DS that
we can actually run would be a useful starting point. One could then
also measure how much the planning time increases with the patch to
get a sense of what the overhead of enabling this feature would be.
Even if it's disabled by default, people aren't going to want to
enable it if it causes planning times to become much longer on many
queries for which there is no benefit.
2. I think there might be techniques we could use to limit planning
effort at an earlier stage when the approach doesn't appear promising.
For example, if the proposed grouping column is already unique, the
exercise is pointless (I think). Ideally we'd like to detect that
without even creating the grouped_rel. But the proposed grouping
column might also be *mostly* unique. For example, consider a table
with a million rows and a column 500,000 distinct values. I suspect it
will be difficult for partial aggregation to work out to a win in a
case like this, because I think that the cost of performing the
partial aggregation will not reduce the cost either of the final
aggregation or of the intervening join steps by enough to compensate.
It would be best to find a way to avoid generating a lot of rels and
paths in cases where there's really not much hope of a win.
One could, perhaps, imagine going further with this by postponing
eager aggregation planning until after regular paths have been built,
so that we have good cardinality estimates. Suppose the query joins a
single fact table to a series of dimension tables. The final plan thus
uses the fact table as the driving table and joins to the dimension
tables one by one. Do we really need to consider partial aggregation
at every level? Perhaps just where there's been a significant row
count reduction since the last time we tried it, but at the next level
the row count will increase again?
Maybe there are other heuristics we could use in addition or instead.
3. In general, we are quite bad at estimating what will happen to the
row count after an aggregation, and we have no real idea what the
distribution of values will be. That might be a problem for this
patch, because it seems like the decisions we will make about where to
perform the partial aggregation might end up being quite random. At
the top of the join tree, I'll need to compare directly aggregating
the best join path with various paths that involve a finalize
aggregation step at the top and a partial aggregation step further
down. But my cost estimates and row counts for the partial aggregate
steps seem like they will often be quite poor, which means that the
plans that use those partial aggregate steps might also be quite poor.
Even if they're not, I fear that comparing the cost of those
PartialAggregate-Join(s)-FinalizeAggregate paths to the direct
Aggregate path will look too much like comparing random numbers. We
need to know whether the combination of the FinalizeAggregate step and
the PartialAggregate step will be more or less expensive than a plain
old Aggregate, but how can we tell that if we don't have accurate
cardinality estimates?
Thanks for working on this.
--
Robert Haas
EDB: http://www.enterprisedb.com
Richard Guo <guofenglinux@gmail.com> 于2024年8月21日周三 15:11写道:
On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <guofenglinux@gmail.com>
wrote:I had a self-review of this patchset and made some refactoring,
especially to the function that creates the RelAggInfo structure for a
given relation. While there were no major changes, the code should
now be simpler.I found a bug in v10 patchset: when we generate the GROUP BY clauses
for the partial aggregation that is pushed down to a non-aggregated
relation, we may produce a clause with a tleSortGroupRef that
duplicates one already present in the query's groupClause, which would
cause problems.Attached is the updated version of the patchset that fixes this bug
and includes further code refactoring.
Rectenly, I do some benchmark tests, mainly on tpch and tpcds.
tpch tests have no plan diff, so I do not continue to test on tpch.
tpcds(10GB) tests have 22 plan diff as below:
4.sql, 5.sql, 8.sql,11.sql,19.sql,23.sql,31.sql,
33.sql,39.sql,45.sql,46.sql,47.sql,53.sql,
56.sql,57.sql,60.sql,63.sql,68.sql,74.sql,77.sql,80.sql,89.sql
I haven't look all of them. I just pick few simple plan test(e.g. 19.sql,
45.sql).
For example, 19.sql, eager agg pushdown doesn't get large gain, but a little
performance regress.
I will continue to do benchmark on this feature.
[1]: https://github.com/tenderwg/eager_agg
--
Tender Wang
On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <tndrwang@gmail.com> wrote:
Rectenly, I do some benchmark tests, mainly on tpch and tpcds.
tpch tests have no plan diff, so I do not continue to test on tpch.
Interesting to know.
tpcds(10GB) tests have 22 plan diff as below:
4.sql, 5.sql, 8.sql,11.sql,19.sql,23.sql,31.sql, 33.sql,39.sql,45.sql,46.sql,47.sql,53.sql,
56.sql,57.sql,60.sql,63.sql,68.sql,74.sql,77.sql,80.sql,89.sql
OK.
I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql).
For example, 19.sql, eager agg pushdown doesn't get large gain, but a little
performance regress.
Yeah, this is one of the things I was worried about in my previous
reply to Richard. It would be worth Richard, or someone, probing into
exactly why that's happening. My fear is that we just don't have good
enough estimates to make good decisions, but there might well be
another explanation.
I will continue to do benchmark on this feature.
Thanks!
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Aug 23, 2024 at 11:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
Here are some initial, high-level thoughts about this patch set.
Thank you for your review and feedback! It helps a lot in moving this
work forward.
1. As far as I can see, there's no real performance testing on this
thread. I expect that it's possible to show an arbitrarily large gain
for the patch by finding a case where partial aggregation is way
better than anything we currently know, but that's not very
interesting. What I think would be useful to do is find a corpus of
existing queries on an existing data set and try them with and without
the patch and see which query plans change and whether they're
actually better. For example, maybe TPC-H or the subset of TPC-DS that
we can actually run would be a useful starting point. One could then
also measure how much the planning time increases with the patch to
get a sense of what the overhead of enabling this feature would be.
Even if it's disabled by default, people aren't going to want to
enable it if it causes planning times to become much longer on many
queries for which there is no benefit.
Right. I haven’t had time to run any benchmarks yet, but that is
something I need to do.
2. I think there might be techniques we could use to limit planning
effort at an earlier stage when the approach doesn't appear promising.
For example, if the proposed grouping column is already unique, the
exercise is pointless (I think). Ideally we'd like to detect that
without even creating the grouped_rel. But the proposed grouping
column might also be *mostly* unique. For example, consider a table
with a million rows and a column 500,000 distinct values. I suspect it
will be difficult for partial aggregation to work out to a win in a
case like this, because I think that the cost of performing the
partial aggregation will not reduce the cost either of the final
aggregation or of the intervening join steps by enough to compensate.
It would be best to find a way to avoid generating a lot of rels and
paths in cases where there's really not much hope of a win.One could, perhaps, imagine going further with this by postponing
eager aggregation planning until after regular paths have been built,
so that we have good cardinality estimates. Suppose the query joins a
single fact table to a series of dimension tables. The final plan thus
uses the fact table as the driving table and joins to the dimension
tables one by one. Do we really need to consider partial aggregation
at every level? Perhaps just where there's been a significant row
count reduction since the last time we tried it, but at the next level
the row count will increase again?Maybe there are other heuristics we could use in addition or instead.
Yeah, one of my concerns with this work is that it can use
significantly more CPU time and memory during planning once enabled.
It would be great if we have some efficient heuristics to limit the
effort. I'll work on that next and see what happens.
3. In general, we are quite bad at estimating what will happen to the
row count after an aggregation, and we have no real idea what the
distribution of values will be. That might be a problem for this
patch, because it seems like the decisions we will make about where to
perform the partial aggregation might end up being quite random. At
the top of the join tree, I'll need to compare directly aggregating
the best join path with various paths that involve a finalize
aggregation step at the top and a partial aggregation step further
down. But my cost estimates and row counts for the partial aggregate
steps seem like they will often be quite poor, which means that the
plans that use those partial aggregate steps might also be quite poor.
Even if they're not, I fear that comparing the cost of those
PartialAggregate-Join(s)-FinalizeAggregate paths to the direct
Aggregate path will look too much like comparing random numbers. We
need to know whether the combination of the FinalizeAggregate step and
the PartialAggregate step will be more or less expensive than a plain
old Aggregate, but how can we tell that if we don't have accurate
cardinality estimates?
Yeah, I'm concerned about this too. In addition to the inaccuracies
in aggregation estimates, our estimates for joins are sometimes not
very accurate either. All this are likely to result in regressions
with eager aggregation in some cases. Currently I don't have a good
answer to this problem. Maybe we can run some benchmarks first and
investigate the regressions discovered on a case-by-case basis to better
understand the specific issues.
Thanks
Richard
On Wed, Aug 28, 2024 at 11:57 AM Tender Wang <tndrwang@gmail.com> wrote:
Rectenly, I do some benchmark tests, mainly on tpch and tpcds.
tpch tests have no plan diff, so I do not continue to test on tpch.
tpcds(10GB) tests have 22 plan diff as below:
4.sql, 5.sql, 8.sql,11.sql,19.sql,23.sql,31.sql, 33.sql,39.sql,45.sql,46.sql,47.sql,53.sql,
56.sql,57.sql,60.sql,63.sql,68.sql,74.sql,77.sql,80.sql,89.sqlI haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql).
For example, 19.sql, eager agg pushdown doesn't get large gain, but a little
performance regress.I will continue to do benchmark on this feature.
Thank you for running the benchmarks. That really helps a lot.
Thanks
Richard
On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <tndrwang@gmail.com> wrote:
I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql).
For example, 19.sql, eager agg pushdown doesn't get large gain, but a little
performance regress.Yeah, this is one of the things I was worried about in my previous
reply to Richard. It would be worth Richard, or someone, probing into
exactly why that's happening. My fear is that we just don't have good
enough estimates to make good decisions, but there might well be
another explanation.
It's great that we have a query to probe into. Your guess is likely
correct: it may be caused by poor estimates.
Tender, would you please help provide the outputs of
EXPLAIN (COSTS ON, ANALYZE)
on 19.sql with and without eager aggregation?
I will continue to do benchmark on this feature.
Thanks again for running the benchmarks.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> 于2024年8月29日周四 10:46写道:
On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <tndrwang@gmail.com> wrote:
I haven't look all of them. I just pick few simple plan test(e.g.
19.sql, 45.sql).
For example, 19.sql, eager agg pushdown doesn't get large gain, but a
little
performance regress.
Yeah, this is one of the things I was worried about in my previous
reply to Richard. It would be worth Richard, or someone, probing into
exactly why that's happening. My fear is that we just don't have good
enough estimates to make good decisions, but there might well be
another explanation.It's great that we have a query to probe into. Your guess is likely
correct: it may be caused by poor estimates.Tender, would you please help provide the outputs of
EXPLAIN (COSTS ON, ANALYZE)
on 19.sql with and without eager aggregation?
Yeah, in [1]https://github.com/tenderwg/eager_agg, 19_off.out and 19_on.out are the output of explain(costs off,
analyze).
I will do EXPLAIN(COSTS ON, ANALYZE) tests and upload them later today.
[1]: https://github.com/tenderwg/eager_agg
--
Tender Wang
Richard Guo <guofenglinux@gmail.com> 于2024年8月29日周四 10:46写道:
On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <tndrwang@gmail.com> wrote:
I haven't look all of them. I just pick few simple plan test(e.g.
19.sql, 45.sql).
For example, 19.sql, eager agg pushdown doesn't get large gain, but a
little
performance regress.
Yeah, this is one of the things I was worried about in my previous
reply to Richard. It would be worth Richard, or someone, probing into
exactly why that's happening. My fear is that we just don't have good
enough estimates to make good decisions, but there might well be
another explanation.It's great that we have a query to probe into. Your guess is likely
correct: it may be caused by poor estimates.Tender, would you please help provide the outputs of
EXPLAIN (COSTS ON, ANALYZE)
on 19.sql with and without eager aggregation?
I upload EXPLAIN(COSTS ON, ANALYZE) test to [1].
I ran the same query three times, and I chose the third time result.
You can check 19_off_explain.out and 19_on_explain.out.
[1]: https://github.com/tenderwg/eager_agg
--
Tender Wang
On Wed, Aug 28, 2024 at 10:26 PM Richard Guo <guofenglinux@gmail.com> wrote:
Yeah, I'm concerned about this too. In addition to the inaccuracies
in aggregation estimates, our estimates for joins are sometimes not
very accurate either. All this are likely to result in regressions
with eager aggregation in some cases. Currently I don't have a good
answer to this problem. Maybe we can run some benchmarks first and
investigate the regressions discovered on a case-by-case basis to better
understand the specific issues.
While it's true that we can make mistakes during join estimation, I
believe aggregate estimation tends to be far worse.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 28, 2024 at 11:38 PM Tender Wang <tndrwang@gmail.com> wrote:
I upload EXPLAIN(COSTS ON, ANALYZE) test to [1].
I ran the same query three times, and I chose the third time result.
You can check 19_off_explain.out and 19_on_explain.out.
So, in 19_off_explain.out, we got this:
-> Finalize GroupAggregate (cost=666986.48..667015.35
rows=187 width=142) (actual time=272.649..334.318 rows=900 loops=1)
-> Gather Merge (cost=666986.48..667010.21 rows=187
width=142) (actual time=272.644..333.847 rows=901 loops=1)
-> Partial GroupAggregate
(cost=665986.46..665988.60 rows=78 width=142) (actual
time=266.379..267.476 rows=300 loops=3)
-> Sort (cost=665986.46..665986.65
rows=78 width=116) (actual time=266.367..266.583 rows=5081 loops=3)
And in 19_on_explan.out, we got this:
-> Finalize GroupAggregate (cost=666987.03..666989.77
rows=19 width=142) (actual time=285.018..357.374 rows=900 loops=1)
-> Gather Merge (cost=666987.03..666989.25 rows=19
width=142) (actual time=285.000..352.793 rows=15242 loops=1)
-> Sort (cost=665987.01..665987.03 rows=8
width=142) (actual time=273.391..273.580 rows=5081 loops=3)
-> Nested Loop (cost=665918.00..665986.89
rows=8 width=142) (actual time=252.667..269.719 rows=5081 loops=3)
-> Nested Loop
(cost=665917.85..665985.43 rows=8 width=157) (actual
time=252.656..264.755 rows=5413 loops=3)
-> Partial GroupAggregate
(cost=665917.43..665920.10 rows=82 width=150) (actual
time=252.643..255.627 rows=5413 loops=3)
-> Sort
(cost=665917.43..665917.64 rows=82 width=124) (actual
time=252.636..252.927 rows=5413 loops=3)
So, the patch was expected to cause the number of rows passing through
the Gather Merge to decrease from 197 to 19, but actually caused the
number of rows passing through the Gather Merge to increase from 901
to 15242. When the PartialAggregate was positioned at the top of the
join tree, it reduced the number of rows from 5081 to 300; but when it
was pushed down below two joins, it didn't reduce the row count at
all, and the subsequent two joins reduced it by less than 10%.
Now, you could complain about the fact that the Parallel Hash Join
isn't well-estimated here, but my question is: why does the planner
think that the PartialAggregate should go specifically here? In both
plans, the PartialAggregate isn't expected to change the row count.
And if that is true, then it's going to be cheapest to do it at the
point where the joins have reduced the row count to the minimum value.
Here, that would be at the top of the plan tree, where we have only
5081 estimated rows, but instead, the patch chooses to do it as soon
as we have all of the grouping columns, when we. still have 5413 rows.
I don't understand why that path wins on cost, unless it's just that
the paths compare fuzzily the same, in which case it kind of goes to
my earlier point about not really having the statistics to know which
way is actually going to be better.
--
Robert Haas
EDB: http://www.enterprisedb.com
Richard Guo <guofenglinux@gmail.com> 于2024年8月21日周三 15:11写道:
On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <guofenglinux@gmail.com>
wrote:I had a self-review of this patchset and made some refactoring,
especially to the function that creates the RelAggInfo structure for a
given relation. While there were no major changes, the code should
now be simpler.I found a bug in v10 patchset: when we generate the GROUP BY clauses
for the partial aggregation that is pushed down to a non-aggregated
relation, we may produce a clause with a tleSortGroupRef that
duplicates one already present in the query's groupClause, which would
cause problems.Attached is the updated version of the patchset that fixes this bug
and includes further code refactoring.
The v11-0002 git am failed on HEAD(6c2b5edecc).
tender@iZ2ze6la2dizi7df9q3xheZ:/workspace/postgres$ git am
v11-0002-Implement-Eager-Aggregation.patch
Applying: Implement Eager Aggregation
error: patch failed: src/test/regress/parallel_schedule:119
error: src/test/regress/parallel_schedule: patch does not apply
Patch failed at 0001 Implement Eager Aggregation
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
--
Thanks,
Tender Wang
Richard Guo <guofenglinux@gmail.com> 于2024年8月21日周三 15:11写道:
On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <guofenglinux@gmail.com>
wrote:I had a self-review of this patchset and made some refactoring,
especially to the function that creates the RelAggInfo structure for a
given relation. While there were no major changes, the code should
now be simpler.I found a bug in v10 patchset: when we generate the GROUP BY clauses
for the partial aggregation that is pushed down to a non-aggregated
relation, we may produce a clause with a tleSortGroupRef that
duplicates one already present in the query's groupClause, which would
cause problems.Attached is the updated version of the patchset that fixes this bug
and includes further code refactoring.
I review the v11 patch set, and here are a few of my thoughts:
1. in setup_eager_aggregation(), before calling create_agg_clause_infos(),
it does
some checks if eager aggregation is available. Can we move those checks
into a function,
for example, can_eager_agg(), like can_partial_agg() does?
2. I found that outside of joinrel.c we all use IS_DUMMY_REL, but in
joinrel.c, Tom always uses
is_dummy_rel(). Other commiters use IS_DUMMY_REL.
3. The attached patch does not consider FDW when creating a path for
grouped_rel or grouped_join.
Do we need to think about FDW?
I haven't finished reviewing the patch set. I will continue to learn this
feature.
--
Thanks,
Tender Wang
Richard Guo <guofenglinux@gmail.com> 于2024年8月21日周三 15:11写道:
On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <guofenglinux@gmail.com>
wrote:I had a self-review of this patchset and made some refactoring,
especially to the function that creates the RelAggInfo structure for a
given relation. While there were no major changes, the code should
now be simpler.I found a bug in v10 patchset: when we generate the GROUP BY clauses
for the partial aggregation that is pushed down to a non-aggregated
relation, we may produce a clause with a tleSortGroupRef that
duplicates one already present in the query's groupClause, which would
cause problems.Attached is the updated version of the patchset that fixes this bug
and includes further code refactoring.
I continue to review the v11 version patches. Here are some my thoughts.
1. In make_one_rel(), we have the below codes:
/*
* Build grouped base relations for each base rel if possible.
*/
setup_base_grouped_rels(root);
As far as I know, each base rel only has one grouped base relation, if
possible.
The comments may be changed to "Build a grouped base relation for each base
rel if possible."
2. According to the comments of generate_grouped_paths(), we may generate
paths for a grouped
relation on top of paths of join relation. So the ”rel_plain" argument in
generate_grouped_paths() may be
confused. "plain" usually means "base rel" . How about Re-naming rel_plain
to input_rel?
3. In create_partial_grouping_paths(), The partially_grouped_rel could have
been already created due to eager
aggregation. If partially_grouped_rel exists, its reltarget has been
created. So do we need below logic?
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
* include Vars and Aggrefs needed in HAVING, which might not appear in
* the result tlist, and (2) the Aggrefs must be set in partial mode.
*/
partially_grouped_rel->reltarget =
make_partial_grouping_target(root, grouped_rel->reltarget,
extra->havingQual);
--
Thanks,
Tender Wang
Tender Wang <tndrwang@gmail.com> 于2024年9月4日周三 11:48写道:
Richard Guo <guofenglinux@gmail.com> 于2024年8月21日周三 15:11写道:
On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <guofenglinux@gmail.com>
wrote:I had a self-review of this patchset and made some refactoring,
especially to the function that creates the RelAggInfo structure for a
given relation. While there were no major changes, the code should
now be simpler.I found a bug in v10 patchset: when we generate the GROUP BY clauses
for the partial aggregation that is pushed down to a non-aggregated
relation, we may produce a clause with a tleSortGroupRef that
duplicates one already present in the query's groupClause, which would
cause problems.Attached is the updated version of the patchset that fixes this bug
and includes further code refactoring.The v11-0002 git am failed on HEAD(6c2b5edecc).
tender@iZ2ze6la2dizi7df9q3xheZ:/workspace/postgres$ git am
v11-0002-Implement-Eager-Aggregation.patch
Applying: Implement Eager Aggregation
error: patch failed: src/test/regress/parallel_schedule:119
error: src/test/regress/parallel_schedule: patch does not apply
Patch failed at 0001 Implement Eager Aggregation
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Since MERGE/SPLIT partition has been reverted, the tests *partition_merge*
and *partition_split* should be removed
from parallel_schedule. After doing the above, the 0002 patch can be
applied.
--
Thanks,
Tender Wang
On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <tndrwang@gmail.com> wrote:
I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql).
For example, 19.sql, eager agg pushdown doesn't get large gain, but a little
performance regress.Yeah, this is one of the things I was worried about in my previous
reply to Richard. It would be worth Richard, or someone, probing into
exactly why that's happening. My fear is that we just don't have good
enough estimates to make good decisions, but there might well be
another explanation.
Sorry it takes some time to switch back to this thread.
I revisited the part about cost estimates for grouped paths in this
patch, and I found a big issue: the row estimate for a join path could
be significantly inaccurate if there is a grouped join path beneath
it.
The reason is that it is very tricky to set the size estimates for a
grouped join relation. For a non-grouped join relation, we know that
all its paths have the same rowcount estimate (well, in theory). But
this is not true for a grouped join relation. Suppose we have a
grouped join relation for t1/t2 join. There might be two paths for
it:
Aggregate
-> Join
-> Scan on t1
-> Scan on t2
Or
Join
-> Scan on t1
-> Aggregate
-> Scan on t2
These two paths can have very different rowcount estimates, and we
have no way of knowing which one to set for this grouped join
relation, because we do not know which path would be picked in the
final plan. This issue can be illustrated with the query below.
create table t (a int, b int, c int);
insert into t select i%10, i%10, i%10 from generate_series(1,1000)i;
analyze t;
set enable_eager_aggregate to on;
explain (costs on)
select sum(t2.c) from t t1 join t t2 on t1.a = t2.a join t t3 on t2.b
= t3.b group by t3.a;
QUERY PLAN
---------------------------------------------------------------------------------------
Finalize HashAggregate (cost=6840.60..6840.70 rows=10 width=12)
Group Key: t3.a
-> Nested Loop (cost=1672.00..1840.60 rows=1000000 width=12)
Join Filter: (t2.b = t3.b)
-> Partial HashAggregate (cost=1672.00..1672.10 rows=10 width=12)
Group Key: t2.b
-> Hash Join (cost=28.50..1172.00 rows=100000 width=8)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..16.00 rows=1000 width=4)
-> Hash (cost=16.00..16.00 rows=1000 width=12)
-> Seq Scan on t t2 (cost=0.00..16.00
rows=1000 width=12)
-> Materialize (cost=0.00..21.00 rows=1000 width=8)
-> Seq Scan on t t3 (cost=0.00..16.00 rows=1000 width=8)
(13 rows)
Look at the Nested Loop node:
-> Nested Loop (cost=1672.00..1840.60 rows=1000000 width=12)
How can a 10-row outer path joining a 1000-row inner path generate
1000000 rows? This is because we are using the plan of the first path
described above, and the rowcount estimate of the second path. What a
kluge!
To address this issue, one solution I’m considering is to recalculate
the row count estimate for a grouped join path using its outer and
inner paths. While this may seem expensive, it might not be that bad
since we will cache the results of the selectivity calculation. In
fact, this is already the approach we take for parameterized join
paths (see get_parameterized_joinrel_size).
Any thoughts on this?
Thanks
Richard
On Thu, Sep 5, 2024 at 9:40 AM Tender Wang <tndrwang@gmail.com> wrote:
1. in setup_eager_aggregation(), before calling create_agg_clause_infos(), it does
some checks if eager aggregation is available. Can we move those checks into a function,
for example, can_eager_agg(), like can_partial_agg() does?
We can do this, but I'm not sure this would be better.
2. I found that outside of joinrel.c we all use IS_DUMMY_REL, but in joinrel.c, Tom always uses
is_dummy_rel(). Other commiters use IS_DUMMY_REL.
They are essentially the same: IS_DUMMY_REL() is a macro that wraps
is_dummy_rel(). I think they are interchangeable, and I don’t have a
preference for which one is better.
3. The attached patch does not consider FDW when creating a path for grouped_rel or grouped_join.
Do we need to think about FDW?
We may add support for foreign relations in the future, but for now, I
think we'd better not expand the scope too much until we ensure that
everything is working correctly.
Thanks
Richard
On Wed, Sep 11, 2024 at 10:52 AM Tender Wang <tndrwang@gmail.com> wrote:
1. In make_one_rel(), we have the below codes:
/*
* Build grouped base relations for each base rel if possible.
*/
setup_base_grouped_rels(root);As far as I know, each base rel only has one grouped base relation, if possible.
The comments may be changed to "Build a grouped base relation for each base rel if possible."
Yeah, each base rel has only one grouped rel. However, there is a
comment nearby stating 'consider_parallel flags for each base rel',
which confuses me about whether it should be singular or plural in
this context. Perhaps someone more proficient in English could
clarify this.
2. According to the comments of generate_grouped_paths(), we may generate paths for a grouped
relation on top of paths of join relation. So the ”rel_plain" argument in generate_grouped_paths() may be
confused. "plain" usually means "base rel" . How about Re-naming rel_plain to input_rel?
I don't think 'plain relation' necessarily means 'base relation'. In
this context I think it can mean 'non-grouped relation'. But maybe
I'm wrong.
3. In create_partial_grouping_paths(), The partially_grouped_rel could have been already created due to eager
aggregation. If partially_grouped_rel exists, its reltarget has been created. So do we need below logic?/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
* include Vars and Aggrefs needed in HAVING, which might not appear in
* the result tlist, and (2) the Aggrefs must be set in partial mode.
*/
partially_grouped_rel->reltarget =
make_partial_grouping_target(root, grouped_rel->reltarget,
extra->havingQual);
Yeah, maybe we can avoid building the target list here for
partially_grouped_rel that is generated by eager aggregation.
Thanks
Richard
On Fri, Sep 13, 2024 at 3:48 PM Tender Wang <tndrwang@gmail.com> wrote:
Since MERGE/SPLIT partition has been reverted, the tests *partition_merge* and *partition_split* should be removed
from parallel_schedule. After doing the above, the 0002 patch can be applied.
Yeah, that's what I need to do.
Thanks
Richard
On Wed, Sep 25, 2024 at 11:20 AM Richard Guo <guofenglinux@gmail.com> wrote:
Look at the Nested Loop node:
-> Nested Loop (cost=1672.00..1840.60 rows=1000000 width=12)
How can a 10-row outer path joining a 1000-row inner path generate
1000000 rows? This is because we are using the plan of the first path
described above, and the rowcount estimate of the second path. What a
kluge!To address this issue, one solution I’m considering is to recalculate
the row count estimate for a grouped join path using its outer and
inner paths. While this may seem expensive, it might not be that bad
since we will cache the results of the selectivity calculation. In
fact, this is already the approach we take for parameterized join
paths (see get_parameterized_joinrel_size).
Here is an updated version of this patch that fixes the rowcount
estimate issue along this routine. (see set_joinpath_size.)
Now the Nested Loop node looks like:
-> Nested Loop (cost=1672.00..1840.60 rows=1000 width=12)
(actual time=119.685..122.841 rows=1000 loops=1)
Its rowcount estimate looks much more sane now.
But wait, why are we using nestloop here? My experience suggests that
hashjoin typically outperforms nestloop with input paths of this size
on this type of dataset.
The thing is, the first path (join-then-aggregate one) of the t1/t2
grouped join relation has a much fewer rowcount but more expensive
costs:
:path.rows 10
:path.disabled_nodes 0
:path.startup_cost 1672
:path.total_cost 1672.1
And the second path (aggregate-then-join one) has cheaper costs but
more rows.
:jpath.path.rows 10000
:jpath.path.disabled_nodes 0
:jpath.path.startup_cost 25.75
:jpath.path.total_cost 156.75
Both paths have survived the add_path() tournament for this relation,
and the second one is selected as the cheapest path by set_cheapest,
which mainly uses costs and then pathkeys as the selection criterion.
The rowcount estimate is not taken into account, which is reasonable
because unparameterized paths for the same relation usually have the
same rowcount estimate. And when creating hashjoins, we only consider
the cheapest input paths. This is why we are unable to generate a
hashjoin with the first path.
However, the situation changes with grouped relations, as different
paths of a grouped relation can have very different row counts. To
cope with this, I modified set_cheapest() to also find the fewest-row
unparameterized path if the relation is a grouped relation, and
include it in the cheapest_parameterized_paths list. It could be
argued that this will increase the overall planning time a lot because
it adds one more path to cheapest_parameterized_paths. But in many
cases the fewest-row-path is the same path as cheapest_total_path, in
which case we do not need to add it again.
And now the plan becomes:
explain (costs on)
select sum(t2.c) from t t1 join t t2 on t1.a = t2.a join t t3 on t2.b
= t3.b group by t3.a;
QUERY PLAN
---------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=1706.97..1707.07 rows=10 width=12)
Group Key: t3.a
-> Hash Join (cost=1672.22..1701.97 rows=1000 width=12)
Hash Cond: (t3.b = t2.b)
-> Seq Scan on t t3 (cost=0.00..16.00 rows=1000 width=8)
-> Hash (cost=1672.10..1672.10 rows=10 width=12)
-> Partial HashAggregate (cost=1672.00..1672.10
rows=10 width=12)
Group Key: t2.b
-> Hash Join (cost=28.50..1172.00 rows=100000 width=8)
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1 (cost=0.00..16.00
rows=1000 width=4)
-> Hash (cost=16.00..16.00 rows=1000 width=12)
-> Seq Scan on t t2
(cost=0.00..16.00 rows=1000 width=12)
(13 rows)
I believe this is the most optimal plan we can find for this query on
this dataset.
I also made some changes to how grouped relations are stored in this
version of the patch.
Thanks
Richard
Attachments:
v12-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v12-0001-Implement-Eager-Aggregation.patchDownload
From 20078e2b09402323302d8a29f412b3e0f46bf014 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v12] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
A plan with eager aggregation looks like:
EXPLAIN (COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i = b.j
GROUP BY a.i;
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
and store it in a dedicated list.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths during this phase.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
does not seem to be very useful and is currently not supported.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys. This ensures that we have the correct input for the upper joins
and that an aggregated row from the partial aggregation matches the
other side of the join if and only if each row in the partial group
does, which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
Since eager aggregation can generate many grouped relations, we
introduce a RelInfoList structure, which encapsulates both a list and
a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for grouped relations.
Eager aggregation can use significantly more CPU time and memory than
regular planning when the query involves aggregates and many joining
relations. However, in some cases, the resulting plan can be much
better, justifying the additional planning effort. All the same, for
now, turn this feature off by default.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/README | 79 +
src/backend/optimizer/geqo/geqo_eval.c | 98 +-
src/backend/optimizer/path/allpaths.c | 448 +++++-
src/backend/optimizer/path/costsize.c | 102 +-
src/backend/optimizer/path/joinrels.c | 131 ++
src/backend/optimizer/plan/initsplan.c | 259 ++++
src/backend/optimizer/plan/planmain.c | 17 +-
src/backend/optimizer/plan/planner.c | 99 +-
src/backend/optimizer/util/appendinfo.c | 60 +
src/backend/optimizer/util/pathnode.c | 47 +-
src/backend/optimizer/util/relnode.c | 708 ++++++++-
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 142 +-
src/include/optimizer/pathnode.h | 7 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
src/tools/pgindent/typedefs.list | 7 +-
23 files changed, 3572 insertions(+), 157 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..48b0488184 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6092,7 +6092,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..008c700aea 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,82 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially pushes
+aggregation past a join, and finalizes it once all the relations are joined.
+Eager aggregation may reduce the number of input rows to the join and thus
+could result in a better overall plan.
+
+For example:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+If the partial aggregation on table B significantly reduces the number of
+input rows, the join above will be much cheaper, leading to a more efficient
+final plan.
+
+For the partial aggregation that is pushed down to a non-aggregated relation,
+we need to consider all expressions from this relation that are involved in
+upper join clauses and include them in the grouping keys. This ensures that we
+have the correct input for the upper joins and that an aggregated row from the
+partial aggregation matches the other side of the join if and only if each row
+in the partial group does, which is crucial for maintaining correctness.
+
+One restriction is that we cannot push partial aggregation down to a relation
+that is in the nullable side of an outer join, because the NULL-extended rows
+produced by the outer join would not be available when we perform the partial
+aggregation, while with a non-eager-aggregation plan these rows are available
+for the top-level aggregation. Pushing partial aggregation in this case may
+result in the rows being grouped differently than expected, or produce
+incorrect values from the aggregate functions.
+
+We can also apply eager aggregation to a join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+During the construction of the join tree, we evaluate each base or join
+relation to determine if eager aggregation can be applied. If feasible, we
+create a separate RelOptInfo called a "grouped relation" and generate grouped
+paths by adding sorted and hashed partial aggregation paths on top of the
+non-grouped paths. To limit planning time, we consider only the cheapest
+non-grouped paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation with a
+non-grouped relation. Joining two grouped relations does not seem to be very
+useful and is currently not supported.
+
+If we have generated a grouped relation for the topmost join relation, we need
+to finalize its paths at the end. The final paths will compete in the usual
+way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..cdc9543135 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -39,10 +39,20 @@ typedef struct
int size; /* number of input relations in clump */
} Clump;
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
+
static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
int num_gene, bool force);
static bool desirable_join(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list);
+static void restore_relinfolist(RelInfoList *relinfo_list,
+ RelInfoListInfo *info);
/*
@@ -60,8 +70,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ RelInfoListInfo save_join_rel;
+ RelInfoListInfo save_grouped_rel;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +88,29 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list and
+ * root->grouped_rel_list, which may or may not already contain some
+ * entries. The newly added entries will be recycled by the
+ * MemoryContextDelete below, so we must ensure that each list within the
+ * RelInfoList structures is restored to its former state before exiting.
+ * We can do this by truncating each list to its original length. NOTE
+ * this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables within the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels or
+ * grouped rels, which we very likely are, new hash tables will get built
+ * and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ save_join_rel = save_relinfolist(root->join_rel_list);
+ save_grouped_rel = save_relinfolist(root->grouped_rel_list);
+
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +132,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list and grouped_rel_list to its
+ * former state, and put back original hashtables if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ restore_relinfolist(root->join_rel_list, &save_join_rel);
+ restore_relinfolist(root->grouped_rel_list, &save_grouped_rel);
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +292,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
@@ -336,3 +370,27 @@ desirable_join(PlannerInfo *root,
/* Otherwise postpone the join till later. */
return false;
}
+
+/*
+ * Save the original length and hashtable of a RelInfoList.
+ */
+static RelInfoListInfo
+save_relinfolist(RelInfoList *relinfo_list)
+{
+ RelInfoListInfo info;
+
+ info.savelength = list_length(relinfo_list->items);
+ info.savehash = relinfo_list->hash;
+
+ return info;
+}
+
+/*
+ * Restore the original length and hashtable of a RelInfoList.
+ */
+static void
+restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info)
+{
+ relinfo_list->items = list_truncate(relinfo_list->items, info->savelength);
+ relinfo_list->hash = info->savehash;
+}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 172edb643a..1bd2e63c6f 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,45 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ rel_grouped = build_simple_grouped_rel(root, rel);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +608,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1298,6 +1356,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3306,6 +3394,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3414,9 +3807,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
@@ -3465,6 +3859,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3485,6 +3883,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4353,6 +4772,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..3c38ed7843 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -180,6 +180,9 @@ static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info,
QualCost *qpqual_cost);
+static void set_joinpath_size(PlannerInfo *root, Path *path,
+ Path *outer_path, Path *inner_path,
+ SpecialJoinInfo *sjinfo, List *restrict_clauses);
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
@@ -3370,19 +3373,8 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
if (inner_path_rows <= 0)
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath.path, outer_path, inner_path,
+ extra->sjinfo, path->jpath.joinrestrictinfo);
/* cost of inner-relation source data (we already dealt with outer rel) */
@@ -3822,19 +3814,8 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath.path, outer_path, inner_path,
+ extra->sjinfo, path->jpath.joinrestrictinfo);
/*
* Compute cost of the mergequals and qpquals (other restriction clauses)
@@ -4254,19 +4235,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
path->jpath.path.disabled_nodes = workspace->disabled_nodes;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath.path, outer_path, inner_path,
+ extra->sjinfo, path->jpath.joinrestrictinfo);
/* mark the path with estimated # of batches */
path->num_batches = numbatches;
@@ -5014,6 +4984,60 @@ get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
*qpqual_cost = baserel->baserestrictcost;
}
+/*
+ * set_joinpath_size
+ * Set the correct row estimate for the given join path.
+ *
+ * 'path' is the join path under consideration.
+ * 'outer_path', 'inner_path' are Paths that produce the relations being
+ * joined.
+ * 'sjinfo' is any SpecialJoinInfo relevant to this join.
+ * 'restrict_clauses' lists the join clauses that need to be applied at the
+ * join node.
+ *
+ * Note that for a grouped join relation, its paths could have very different
+ * rowcount estimates, so we need to calculate the rowcount estimate using the
+ * the pair of input paths provided.
+ */
+static void
+set_joinpath_size(PlannerInfo *root, Path *path,
+ Path *outer_path, Path *inner_path,
+ SpecialJoinInfo *sjinfo, List *restrict_clauses)
+{
+ if (IS_GROUPED_REL(path->parent))
+ {
+ /*
+ * Estimate the number of rows of this grouped join path as the sizes
+ * of the input paths times the selectivity of the clauses that have
+ * ended up at this join node.
+ */
+ path->rows = calc_joinrel_size_estimate(root,
+ path->parent,
+ outer_path->parent,
+ inner_path->parent,
+ outer_path->rows,
+ inner_path->rows,
+ sjinfo,
+ restrict_clauses);
+ }
+ else if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = path->parent->rows;
+
+ /*
+ * For partial paths, scale row estimate. We can skip this for grouped
+ * join paths.
+ */
+ if (path->parallel_workers > 0 && !IS_GROUPED_REL(path->parent))
+ {
+ double parallel_divisor = get_parallel_divisor(path);
+
+ path->rows =
+ clamp_row_est(path->rows / parallel_divisor);
+ }
+}
+
/*
* compute_semi_anti_join_factors
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7db5e30eef..43c9fa9526 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -35,6 +35,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +774,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +889,125 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped);
+ }
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids);
+ rel2_grouped = find_grouped_rel(root, rel2->relids);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Joining two grouped relations is currently not supported. Grouping one
+ * side would alter the occurrence of the other side's aggregate transient
+ * states in the final aggregation input. While this issue could be
+ * addressed by adjusting the transient states, it is not deemed
+ * worthwhile for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* Generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1674,6 +1800,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index f3b9821498..ad468d3796 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -327,6 +330,262 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * collect plain Vars for future reference
+ */
+ if (IsA(expr, Var))
+ {
+ root->tlist_vars = list_append_unique(root->tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys, as
+ * defined by the equality operator, implies bitwise equality.
+ * Otherwise, if we put keys with different byte images into the same
+ * group, we may lose some information that could be needed to
+ * evaluate upper qual clauses.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..a8f102beb8 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,12 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list = makeNode(RelInfoList);
+ root->grouped_rel_list->items = NIL;
+ root->grouped_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -76,6 +80,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -257,6 +264,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d92d43a17e..922cb7a793 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -227,7 +227,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4075,9 +4074,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4158,23 +4155,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7074,16 +7064,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -7212,7 +7228,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7220,7 +7236,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7262,19 +7278,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7324,6 +7338,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7347,19 +7376,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7368,6 +7405,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 4989722637..4884d9ddea 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..673e181b32 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -262,6 +262,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
* unparameterized path, too, if there is one; the users of that list find
* it more convenient if that's included.
*
+ * cheapest_parameterized_paths also always includes the fewest-row
+ * unparameterized path, if there is one, for grouped relations. Different
+ * paths of a grouped relation can have very different row counts, and in some
+ * cases the cheapest-total unparameterized path may not be the one with the
+ * fewest row.
+ *
* This is normally called only after we've finished constructing the path
* list for the rel node.
*/
@@ -271,6 +277,7 @@ set_cheapest(RelOptInfo *parent_rel)
Path *cheapest_startup_path;
Path *cheapest_total_path;
Path *best_param_path;
+ Path *fewest_row_path;
List *parameterized_paths;
ListCell *p;
@@ -280,6 +287,7 @@ set_cheapest(RelOptInfo *parent_rel)
elog(ERROR, "could not devise a query plan for the given query");
cheapest_startup_path = cheapest_total_path = best_param_path = NULL;
+ fewest_row_path = NULL;
parameterized_paths = NIL;
foreach(p, parent_rel->pathlist)
@@ -341,6 +349,8 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path == NULL)
{
cheapest_startup_path = cheapest_total_path = path;
+ if (IS_GROUPED_REL(parent_rel))
+ fewest_row_path = path;
continue;
}
@@ -364,6 +374,27 @@ set_cheapest(RelOptInfo *parent_rel)
compare_pathkeys(cheapest_total_path->pathkeys,
path->pathkeys) == PATHKEYS_BETTER2))
cheapest_total_path = path;
+
+ /*
+ * Find the fewest-row unparameterized path for a grouped
+ * relation. If we find two paths of the same row count, try to
+ * keep the one with the cheaper total cost; if the costs are
+ * identical, keep the better-sorted one.
+ */
+ if (IS_GROUPED_REL(parent_rel))
+ {
+ if (fewest_row_path->rows > path->rows)
+ fewest_row_path = path;
+ else if (fewest_row_path->rows == path->rows)
+ {
+ cmp = compare_path_costs(fewest_row_path, path, TOTAL_COST);
+ if (cmp > 0 ||
+ (cmp == 0 &&
+ compare_pathkeys(fewest_row_path->pathkeys,
+ path->pathkeys) == PATHKEYS_BETTER2))
+ fewest_row_path = path;
+ }
+ }
}
}
@@ -371,6 +402,10 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path)
parameterized_paths = lcons(cheapest_total_path, parameterized_paths);
+ /* Add fewest-row unparameterized path, if any, to parameterized_paths */
+ if (fewest_row_path && fewest_row_path != cheapest_total_path)
+ parameterized_paths = lcons(fewest_row_path, parameterized_paths);
+
/*
* If there is no unparameterized path, use the best parameterized path as
* cheapest_total_path (but not as cheapest_startup_path).
@@ -2787,8 +2822,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3043,8 +3077,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3091,8 +3124,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3253,8 +3285,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..6d357db28c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,19 +28,26 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelHashEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelHashEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -83,6 +91,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -276,6 +292,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +423,92 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -479,11 +582,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +594,46 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelHashEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing RelOptInfos */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find a RelOptInfo entry corresponding to 'relids'.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +643,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +671,28 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * find_grouped_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for grouped relations.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->grouped_rel_list, relids);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -619,31 +743,53 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
+ * add_rel_info
+ * Add given relation to the list, and also add it to the auxiliary
+ * hashtable if there is one.
*/
static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
/* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
+ if (list->hash)
{
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = joinrel;
+ hentry->rel = rel;
}
}
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
+/*
+ * add_grouped_rel
+ * Add given grouped relation to the list of grouped relations in the
+ * given PlannerInfo.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ add_rel_info(root->grouped_rel_list, rel);
+}
+
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -755,6 +901,7 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1086,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2508,3 +2656,471 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) rel_grouped->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current
+ * one.
+ *
+ * If the aggregate needs the current rel plus anything else, grouping
+ * the current rel could make some input variables unavailable for the
+ * higher aggregate and also reduce the number of input rows it
+ * receives.
+ *
+ * If the aggregate does not need the current rel at all, then the
+ * current rel should not be grouped, as we do not support joining two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression can be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ */
+ SortGroupClause *sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators((castNode(Var, expr))->vartype,
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final output. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping key in
+ * grouped paths for base or join relations, or 0 otherwise.
+ *
+ * We first check if 'expr' is among the grouping expressions. If it is not,
+ * we then check if 'expr' is known equal to any of the grouping expressions
+ * due to equivalence relationships.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as a grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..7896c48fe2 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..2e9df56cf4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07e2415398..b2a51b121e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * A structure consisting of a list and a hash table to store relations.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,16 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
+
+ /*
+ * grouped_rel_list is a list of all grouped-relation RelOptInfos we have
+ * considered in this planning run. This is only used by eager
+ * aggregation.
+ */
+ RelInfoList *grouped_rel_list; /* list of grouped-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -373,6 +393,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -998,6 +1027,12 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1071,6 +1106,62 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3140,6 +3231,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 1035e6560c..d3c05a61ba 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,10 +314,16 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -353,4 +359,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index a78e90610f..1e7d010ecb 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index aafc173792..cedcd88ebf 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..9f63472eff
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1308 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Incremental Sort
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Sort Key: t2_3.x, t3_3.y
+ Presorted Key: t2_3.x
+ -> Merge Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Merge Cond: (t2_3.x = t3_3.x)
+ -> Sort
+ Output: t2_3.y, t2_3.x
+ Sort Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Sort
+ Output: t3_3.y, t3_3.x
+ Sort Key: t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t2_5.x, t3_5.y, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..1dda69e7c2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -150,6 +150,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -170,7 +171,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f38104ba0..7bff358315 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b6135f0347..89232ae13d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1062,6 +1063,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -1293,7 +1295,6 @@ Join
JoinCostWorkspace
JoinDomain
JoinExpr
-JoinHashEntry
JoinPath
JoinPathExtraData
JoinState
@@ -2374,12 +2375,16 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
+RelHashEntry
RelIdCacheEnt
RelInfo
RelInfoArr
+RelInfoList
+RelInfoListInfo
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
On Fri, Sep 27, 2024 at 11:53 AM Richard Guo <guofenglinux@gmail.com> wrote:
Here is an updated version of this patch that fixes the rowcount
estimate issue along this routine. (see set_joinpath_size.)
I have worked on inventing some heuristics to limit the planning
effort of eager aggregation. One simple yet effective approach I'm
thinking of is to consider a grouped path as NOT useful if its row
reduction ratio falls below a predefined minimum threshold. Currently
I'm using 0.5 as the threshold, but I'm open to other values.
+/* Minimum row reduction ratio at which a grouped path is considered useful */
+#define EAGER_AGGREGATE_RATIO 0.5
When deciding to generate a grouped relation for a base or join
relation, we calculate the row reduction ratio of its grouped paths.
If the ratio is less than EAGER_AGGREGATE_RATIO, we will skip
generating the grouped relation for a base relation, and will only
generate the grouped relation for a join relation if we can produce
any grouped paths by joining its input relations. In either case, we
will NOT generate any grouped paths by adding partial aggregation on
top of the non-grouped paths. This would reduce the number of grouped
paths as well as the grouped relations in many cases where eager
aggregation would not help a lot.
Thanks
Richard
Attachments:
v13-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v13-0001-Implement-Eager-Aggregation.patchDownload
From a3485dba098957069f25d80c25c47340ab9ccf76 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v13] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
A plan with eager aggregation looks like:
EXPLAIN (COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i = b.j
GROUP BY a.i;
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
and store it in a dedicated list.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths during this phase.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
does not seem to be very useful and is currently not supported.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys. This ensures that we have the correct input for the upper joins
and that an aggregated row from the partial aggregation matches the
other side of the join if and only if each row in the partial group
does, which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
Since eager aggregation can generate many grouped relations, we
introduce a RelInfoList structure, which encapsulates both a list and
a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for grouped relations.
Eager aggregation can use significantly more CPU time and memory than
regular planning when the query involves aggregates and many joining
relations. However, in some cases, the resulting plan can be much
better, justifying the additional planning effort. All the same, for
now, turn this feature off by default.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/README | 79 +
src/backend/optimizer/geqo/geqo_eval.c | 98 +-
src/backend/optimizer/path/allpaths.c | 455 +++++-
src/backend/optimizer/path/costsize.c | 102 +-
src/backend/optimizer/path/joinrels.c | 147 ++
src/backend/optimizer/plan/initsplan.c | 260 ++++
src/backend/optimizer/plan/planmain.c | 17 +-
src/backend/optimizer/plan/planner.c | 99 +-
src/backend/optimizer/util/appendinfo.c | 60 +
src/backend/optimizer/util/pathnode.c | 47 +-
src/backend/optimizer/util/relnode.c | 733 ++++++++-
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 148 +-
src/include/optimizer/pathnode.h | 7 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
src/tools/pgindent/typedefs.list | 7 +-
23 files changed, 3626 insertions(+), 158 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index adc62576d1..48b0488184 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6092,7 +6092,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..008c700aea 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,82 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially pushes
+aggregation past a join, and finalizes it once all the relations are joined.
+Eager aggregation may reduce the number of input rows to the join and thus
+could result in a better overall plan.
+
+For example:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+If the partial aggregation on table B significantly reduces the number of
+input rows, the join above will be much cheaper, leading to a more efficient
+final plan.
+
+For the partial aggregation that is pushed down to a non-aggregated relation,
+we need to consider all expressions from this relation that are involved in
+upper join clauses and include them in the grouping keys. This ensures that we
+have the correct input for the upper joins and that an aggregated row from the
+partial aggregation matches the other side of the join if and only if each row
+in the partial group does, which is crucial for maintaining correctness.
+
+One restriction is that we cannot push partial aggregation down to a relation
+that is in the nullable side of an outer join, because the NULL-extended rows
+produced by the outer join would not be available when we perform the partial
+aggregation, while with a non-eager-aggregation plan these rows are available
+for the top-level aggregation. Pushing partial aggregation in this case may
+result in the rows being grouped differently than expected, or produce
+incorrect values from the aggregate functions.
+
+We can also apply eager aggregation to a join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+During the construction of the join tree, we evaluate each base or join
+relation to determine if eager aggregation can be applied. If feasible, we
+create a separate RelOptInfo called a "grouped relation" and generate grouped
+paths by adding sorted and hashed partial aggregation paths on top of the
+non-grouped paths. To limit planning time, we consider only the cheapest
+non-grouped paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation with a
+non-grouped relation. Joining two grouped relations does not seem to be very
+useful and is currently not supported.
+
+If we have generated a grouped relation for the topmost join relation, we need
+to finalize its paths at the end. The final paths will compete in the usual
+way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..cdc9543135 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -39,10 +39,20 @@ typedef struct
int size; /* number of input relations in clump */
} Clump;
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
+
static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
int num_gene, bool force);
static bool desirable_join(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list);
+static void restore_relinfolist(RelInfoList *relinfo_list,
+ RelInfoListInfo *info);
/*
@@ -60,8 +70,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ RelInfoListInfo save_join_rel;
+ RelInfoListInfo save_grouped_rel;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +88,29 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list and
+ * root->grouped_rel_list, which may or may not already contain some
+ * entries. The newly added entries will be recycled by the
+ * MemoryContextDelete below, so we must ensure that each list within the
+ * RelInfoList structures is restored to its former state before exiting.
+ * We can do this by truncating each list to its original length. NOTE
+ * this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables within the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels or
+ * grouped rels, which we very likely are, new hash tables will get built
+ * and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ save_join_rel = save_relinfolist(root->join_rel_list);
+ save_grouped_rel = save_relinfolist(root->grouped_rel_list);
+
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +132,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list and grouped_rel_list to its
+ * former state, and put back original hashtables if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ restore_relinfolist(root->join_rel_list, &save_join_rel);
+ restore_relinfolist(root->grouped_rel_list, &save_grouped_rel);
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +292,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
@@ -336,3 +370,27 @@ desirable_join(PlannerInfo *root,
/* Otherwise postpone the join till later. */
return false;
}
+
+/*
+ * Save the original length and hashtable of a RelInfoList.
+ */
+static RelInfoListInfo
+save_relinfolist(RelInfoList *relinfo_list)
+{
+ RelInfoListInfo info;
+
+ info.savelength = list_length(relinfo_list->items);
+ info.savehash = relinfo_list->hash;
+
+ return info;
+}
+
+/*
+ * Restore the original length and hashtable of a RelInfoList.
+ */
+static void
+restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info)
+{
+ relinfo_list->items = list_truncate(relinfo_list->items, info->savelength);
+ relinfo_list->hash = info->savehash;
+}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 172edb643a..0ac2c2d507 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,45 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ rel_grouped = build_simple_grouped_rel(root, rel);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +608,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1298,6 +1356,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3306,6 +3394,318 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ /*
+ * If the grouped paths for the given relation are not considered useful,
+ * do not bother to generate them.
+ */
+ if (!agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3414,9 +3814,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
@@ -3465,6 +3866,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3485,6 +3890,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4353,6 +4779,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e1523d15df..3c38ed7843 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -180,6 +180,9 @@ static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info,
QualCost *qpqual_cost);
+static void set_joinpath_size(PlannerInfo *root, Path *path,
+ Path *outer_path, Path *inner_path,
+ SpecialJoinInfo *sjinfo, List *restrict_clauses);
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
@@ -3370,19 +3373,8 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
if (inner_path_rows <= 0)
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath.path, outer_path, inner_path,
+ extra->sjinfo, path->jpath.joinrestrictinfo);
/* cost of inner-relation source data (we already dealt with outer rel) */
@@ -3822,19 +3814,8 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath.path, outer_path, inner_path,
+ extra->sjinfo, path->jpath.joinrestrictinfo);
/*
* Compute cost of the mergequals and qpquals (other restriction clauses)
@@ -4254,19 +4235,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
path->jpath.path.disabled_nodes = workspace->disabled_nodes;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath.path, outer_path, inner_path,
+ extra->sjinfo, path->jpath.joinrestrictinfo);
/* mark the path with estimated # of batches */
path->num_batches = numbatches;
@@ -5014,6 +4984,60 @@ get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
*qpqual_cost = baserel->baserestrictcost;
}
+/*
+ * set_joinpath_size
+ * Set the correct row estimate for the given join path.
+ *
+ * 'path' is the join path under consideration.
+ * 'outer_path', 'inner_path' are Paths that produce the relations being
+ * joined.
+ * 'sjinfo' is any SpecialJoinInfo relevant to this join.
+ * 'restrict_clauses' lists the join clauses that need to be applied at the
+ * join node.
+ *
+ * Note that for a grouped join relation, its paths could have very different
+ * rowcount estimates, so we need to calculate the rowcount estimate using the
+ * the pair of input paths provided.
+ */
+static void
+set_joinpath_size(PlannerInfo *root, Path *path,
+ Path *outer_path, Path *inner_path,
+ SpecialJoinInfo *sjinfo, List *restrict_clauses)
+{
+ if (IS_GROUPED_REL(path->parent))
+ {
+ /*
+ * Estimate the number of rows of this grouped join path as the sizes
+ * of the input paths times the selectivity of the clauses that have
+ * ended up at this join node.
+ */
+ path->rows = calc_joinrel_size_estimate(root,
+ path->parent,
+ outer_path->parent,
+ inner_path->parent,
+ outer_path->rows,
+ inner_path->rows,
+ sjinfo,
+ restrict_clauses);
+ }
+ else if (path->param_info)
+ path->rows = path->param_info->ppi_rows;
+ else
+ path->rows = path->parent->rows;
+
+ /*
+ * For partial paths, scale row estimate. We can skip this for grouped
+ * join paths.
+ */
+ if (path->parallel_workers > 0 && !IS_GROUPED_REL(path->parent))
+ {
+ double parallel_divisor = get_parallel_divisor(path);
+
+ path->rows =
+ clamp_row_est(path->rows / parallel_divisor);
+ }
+}
+
/*
* compute_semi_anti_join_factors
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7db5e30eef..20698e48f0 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -35,6 +35,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +774,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +889,141 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+ bool yet_to_add = false;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ /*
+ * If the grouped paths for the given join relation are considered
+ * useful, add the grouped relation we just built to the PlannerInfo
+ * to make it available for further joining or for acting as the upper
+ * rel representing the result of partial aggregation. Otherwise, we
+ * need to postpone the decision on adding the grouped relation to the
+ * PlannerInfo, as it depends on whether we can generate any grouped
+ * paths by joining the given pair of input relations.
+ */
+ if (agg_info->agg_useful)
+ add_grouped_rel(root, rel_grouped);
+ else
+ yet_to_add = true;
+ }
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids);
+ rel2_grouped = find_grouped_rel(root, rel2->relids);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Joining two grouped relations is currently not supported. Grouping one
+ * side would alter the occurrence of the other side's aggregate transient
+ * states in the final aggregation input. While this issue could be
+ * addressed by adjusting the transient states, it is not deemed
+ * worthwhile for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* Generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+
+ /*
+ * Since we have generated grouped paths by joining the given pair of
+ * input relations, add the grouped relation to the PlannerInfo if we have
+ * not already done so.
+ */
+ if (yet_to_add)
+ add_grouped_rel(root, rel_grouped);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1674,6 +1816,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index c5bc0f51e9..d5811026f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -80,6 +81,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -386,6 +389,263 @@ add_vars_to_attr_needed(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * collect plain Vars for future reference
+ */
+ if (IsA(expr, Var))
+ {
+ root->tlist_vars = list_append_unique(root->tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys, as
+ * defined by the equality operator, implies bitwise equality.
+ * Otherwise, if we put keys with different byte images into the same
+ * group, we may lose some information that could be needed to
+ * evaluate upper qual clauses.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index e17d31a5c3..a8f102beb8 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,12 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list = makeNode(RelInfoList);
+ root->grouped_rel_list->items = NIL;
+ root->grouped_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -76,6 +80,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -257,6 +264,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d92d43a17e..922cb7a793 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -227,7 +227,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4075,9 +4074,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4158,23 +4155,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7074,16 +7064,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys);
if (path == NULL)
@@ -7212,7 +7228,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7220,7 +7236,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7262,19 +7278,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7324,6 +7338,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7347,19 +7376,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7368,6 +7405,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 4989722637..4884d9ddea 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..673e181b32 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -262,6 +262,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
* unparameterized path, too, if there is one; the users of that list find
* it more convenient if that's included.
*
+ * cheapest_parameterized_paths also always includes the fewest-row
+ * unparameterized path, if there is one, for grouped relations. Different
+ * paths of a grouped relation can have very different row counts, and in some
+ * cases the cheapest-total unparameterized path may not be the one with the
+ * fewest row.
+ *
* This is normally called only after we've finished constructing the path
* list for the rel node.
*/
@@ -271,6 +277,7 @@ set_cheapest(RelOptInfo *parent_rel)
Path *cheapest_startup_path;
Path *cheapest_total_path;
Path *best_param_path;
+ Path *fewest_row_path;
List *parameterized_paths;
ListCell *p;
@@ -280,6 +287,7 @@ set_cheapest(RelOptInfo *parent_rel)
elog(ERROR, "could not devise a query plan for the given query");
cheapest_startup_path = cheapest_total_path = best_param_path = NULL;
+ fewest_row_path = NULL;
parameterized_paths = NIL;
foreach(p, parent_rel->pathlist)
@@ -341,6 +349,8 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path == NULL)
{
cheapest_startup_path = cheapest_total_path = path;
+ if (IS_GROUPED_REL(parent_rel))
+ fewest_row_path = path;
continue;
}
@@ -364,6 +374,27 @@ set_cheapest(RelOptInfo *parent_rel)
compare_pathkeys(cheapest_total_path->pathkeys,
path->pathkeys) == PATHKEYS_BETTER2))
cheapest_total_path = path;
+
+ /*
+ * Find the fewest-row unparameterized path for a grouped
+ * relation. If we find two paths of the same row count, try to
+ * keep the one with the cheaper total cost; if the costs are
+ * identical, keep the better-sorted one.
+ */
+ if (IS_GROUPED_REL(parent_rel))
+ {
+ if (fewest_row_path->rows > path->rows)
+ fewest_row_path = path;
+ else if (fewest_row_path->rows == path->rows)
+ {
+ cmp = compare_path_costs(fewest_row_path, path, TOTAL_COST);
+ if (cmp > 0 ||
+ (cmp == 0 &&
+ compare_pathkeys(fewest_row_path->pathkeys,
+ path->pathkeys) == PATHKEYS_BETTER2))
+ fewest_row_path = path;
+ }
+ }
}
}
@@ -371,6 +402,10 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path)
parameterized_paths = lcons(cheapest_total_path, parameterized_paths);
+ /* Add fewest-row unparameterized path, if any, to parameterized_paths */
+ if (fewest_row_path && fewest_row_path != cheapest_total_path)
+ parameterized_paths = lcons(fewest_row_path, parameterized_paths);
+
/*
* If there is no unparameterized path, use the best parameterized path as
* cheapest_total_path (but not as cheapest_startup_path).
@@ -2787,8 +2822,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3043,8 +3077,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3091,8 +3124,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3253,8 +3285,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index d7266e4cdb..d4eb82da57 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,19 +28,26 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelHashEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelHashEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -83,7 +91,17 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
-
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+
+/* Minimum row reduction ratio at which a grouped path is considered useful */
+#define EAGER_AGGREGATE_RATIO 0.5
/*
* setup_simple_rel_arrays
@@ -276,6 +294,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +425,99 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If the grouped paths for the given base relation are not considered
+ * useful, do not build the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -479,11 +591,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +603,46 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelHashEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing RelOptInfos */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find a RelOptInfo entry corresponding to 'relids'.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +652,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +680,28 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * find_grouped_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for grouped relations.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->grouped_rel_list, relids);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -619,31 +752,53 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
+ * add_rel_info
+ * Add given relation to the list, and also add it to the auxiliary
+ * hashtable if there is one.
*/
static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
/* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
+ if (list->hash)
{
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = joinrel;
+ hentry->rel = rel;
}
}
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
+/*
+ * add_grouped_rel
+ * Add given grouped relation to the list of grouped relations in the
+ * given PlannerInfo.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ add_rel_info(root->grouped_rel_list, rel);
+}
+
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -755,6 +910,7 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1095,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2508,3 +2665,485 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) rel_grouped->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the row reduction ratio is greater than EAGER_AGGREGATE_RATIO.
+ */
+ agg_info->agg_useful =
+ (agg_info->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * row reduction ratio is greater than EAGER_AGGREGATE_RATIO.
+ */
+ result->agg_useful =
+ (result->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current
+ * one.
+ *
+ * If the aggregate needs the current rel plus anything else, grouping
+ * the current rel could make some input variables unavailable for the
+ * higher aggregate and also reduce the number of input rows it
+ * receives.
+ *
+ * If the aggregate does not need the current rel at all, then the
+ * current rel should not be grouped, as we do not support joining two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression can be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ */
+ SortGroupClause *sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators((castNode(Var, expr))->vartype,
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final output. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping key in
+ * grouped paths for base or join relations, or 0 otherwise.
+ *
+ * We first check if 'expr' is among the grouping expressions. If it is not,
+ * we then check if 'expr' is known equal to any of the grouping expressions
+ * due to equivalence relationships.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as a grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 686309db58..7896c48fe2 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc40a..2e9df56cf4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -413,6 +413,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 07e2415398..330a5f2f57 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * A structure consisting of a list and a hash table to store relations.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,16 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
+
+ /*
+ * grouped_rel_list is a list of all grouped-relation RelOptInfos we have
+ * considered in this planning run. This is only used by eager
+ * aggregation.
+ */
+ RelInfoList *grouped_rel_list; /* list of grouped-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -373,6 +393,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -998,6 +1027,12 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1071,6 +1106,68 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3140,6 +3237,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 1035e6560c..d3c05a61ba 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,10 +314,16 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -353,4 +359,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54869d4401..a189b7f18c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 93137261e4..5008c86feb 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -74,6 +74,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..9f63472eff
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1308 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Incremental Sort
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Sort Key: t2_3.x, t3_3.y
+ Presorted Key: t2_3.x
+ -> Merge Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Merge Cond: (t2_3.x = t3_3.x)
+ -> Sort
+ Output: t2_3.y, t2_3.x
+ Sort Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Sort
+ Output: t3_3.y, t3_3.x
+ Sort Key: t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t2_5.x, t3_5.y, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index fad7fc3a7e..1dda69e7c2 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -150,6 +150,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -170,7 +171,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(22 rows)
+(23 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4f38104ba0..7bff358315 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c4de597b1f..4c623a0854 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1063,6 +1064,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -1294,7 +1296,6 @@ Join
JoinCostWorkspace
JoinDomain
JoinExpr
-JoinHashEntry
JoinPath
JoinPathExtraData
JoinState
@@ -2374,12 +2375,16 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
+RelHashEntry
RelIdCacheEnt
RelInfo
RelInfoArr
+RelInfoList
+RelInfoListInfo
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
On Sat, Oct 5, 2024 at 6:23 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Sep 27, 2024 at 11:53 AM Richard Guo <guofenglinux@gmail.com> wrote:
Here is an updated version of this patch that fixes the rowcount
estimate issue along this routine. (see set_joinpath_size.)I have worked on inventing some heuristics to limit the planning
effort of eager aggregation. One simple yet effective approach I'm
thinking of is to consider a grouped path as NOT useful if its row
reduction ratio falls below a predefined minimum threshold. Currently
I'm using 0.5 as the threshold, but I'm open to other values.
I ran the TPC-DS benchmark at scale 10 and observed eager aggregation
applied in several queries, including q4, q8, q11, q23, q31, q33, and
q77. Notably, the regression in q19 that Tender identified with v11
has disappeared in v13.
Here’s a comparison of Execution Time and Planning Time for the seven
queries with eager aggregation disabled versus enabled (best of 3).
Execution Time:
EAGER-AGG-OFF EAGER-AGG-ON
q4 105787.963 ms 34807.938 ms
q8 1407.454 ms 1654.923 ms
q11 67899.213 ms 18670.086 ms
q23 45945.849 ms 42990.652 ms
q31 10463.536 ms 10244.175 ms
q33 2186.928 ms 2217.228 ms
q77 2360.565 ms 2416.674 ms
Planning Time:
EAGER-AGG-OFF EAGER-AGG-ON
q4 2.334 ms 2.602 ms
q8 0.685 ms 0.647 ms
q11 0.935 ms 1.094 ms
q23 2.666 ms 2.582 ms
q31 1.051 ms 1.206 ms
q33 1.248 ms 1.796 ms
q77 0.967 ms 0.962 ms
There are good performance improvements in q4 and q11 (3~4 times).
For the other queries, execution times remain largely unchanged,
falling within the margin of error, with no notable regressions
observed.
For the planning time, I do not see notable regressions for any of the
seven queries.
It seems that the new cost estimates and the new heuristic are working
pretty well.
Thanks
Richard
On Sat, Oct 5, 2024 at 6:23 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Sep 27, 2024 at 11:53 AM Richard Guo <guofenglinux@gmail.com> wrote:
Here is an updated version of this patch that fixes the rowcount
estimate issue along this routine. (see set_joinpath_size.)
in the function setup_eager_aggregation,
can we be more conservative about cases where eager aggregation can be applied.
I see the following case where eager aggregation is not OK.
we can return earlier, so we won't call
create_grouping_expr_infos(root); create_agg_clause_infos(root);
, so we can avoid unintended consequences.
1. root->parse->resultRelation > 0
just be 100% sure we are only dealing with SELECT, or we can add
Assert at the end of setup_eager_aggregation.
2. join type is FULL JOIN, (i am not sure about other Semijoins and
anti-semijoins types).
3. root->parse->windowClause != NIL
I am not sure whether enable_eager_aggregate can be useful when the
LIMIT clause is there,
the code comment not mentioned.
I am also not sure about the Locking Clause, since the code is not mentioned.
EXPLAIN (COSTS OFF, settings, verbose)
SELECT avg(t2.c)
FROM (select * from eager_agg_t1 for update) t1 JOIN (select * from
eager_agg_t2 for update) t2 ON t1.b = t2.b GROUP BY t1.a;
can eager aggregate apply to above query?
in struct PlannerInfo.
/* list of AggClauseInfos */
List *agg_clause_list;
/* list of GroupExprInfos */
List *group_expr_list;
/* list of plain Vars contained in targetlist and havingQual */
List *tlist_vars;
we can comment that that agg_clause_list, tlist_vars are unique.
lack doc entry in doc/src/sgml/config.sgml
we can put after varlistentry enable_bitmapscan
we can at least mention that
enable_eager_aggregate, The default value is <literal>off</literal>.
There are no tests related to aggregate with filter clauses.
currently seems to support it.
some of the "foreach" can be rewritten to foreach_node
see
https://git.postgresql.org/cgit/postgresql.git/commit/?id=14dd0f27d7cd56ffae9ecdbe324965073d01a9ff
/*
* Eager aggregation is only possible if equality of grouping keys, as
* defined by the equality operator, implies bitwise equality.
* Otherwise, if we put keys with different byte images into the same
* group, we may lose some information that could be needed to
* evaluate upper qual clauses.
*
* For example, the NUMERIC data type is not supported because values
* that fall into the same group according to the equality operator
* (e.g. 0 and 0.0) can have different scale.
*/
tce = lookup_type_cache(exprType((Node *) tle->expr),
TYPECACHE_BTREE_OPFAMILY);
if (!OidIsValid(tce->btree_opf) ||
!OidIsValid(tce->btree_opintype))
return;
equalimageproc = get_opfamily_proc(tce->btree_opf,
tce->btree_opintype,
tce->btree_opintype,
BTEQUALIMAGE_PROC);
if (!OidIsValid(equalimageproc) ||
!DatumGetBool(OidFunctionCall1Coll(equalimageproc,
tce->typcollation,
ObjectIdGetDatum(tce->btree_opintype))))
return;
I am confused by BTEQUALIMAGE_PROC.
* To facilitate B-Tree deduplication, an operator class may choose to
* offer a forth amproc procedure (BTEQUALIMAGE_PROC). For full details,
* see doc/src/sgml/btree.sgml.
the above is comments about BTEQUALIMAGE_PROC in src/include/access/nbtree.h
equalimage
Optionally, a btree operator family may provide equalimage (“equality implies
image equality”) support functions, registered under support function number 4.
These functions allow the core code to determine when it is safe to apply the
btree deduplication optimization. Currently, equalimage functions are only
called when building or rebuilding an index.
the above is BTEQUALIMAGE_PROC on
https://www.postgresql.org/docs/current/btree.html#BTREE-SUPPORT-FUNCS
integers support eager aggregate.
select amproc.*, amproclefttype::regtype
from pg_amproc amproc join pg_opfamily opf on amproc.amprocfamily = opf.oid
where amproc.amprocnum = 4
and amproc.amproclefttype = amproc.amprocrighttype
and opf.opfmethod = 403
and amproc.amprocrighttype = 'int'::regtype;
returns
oid | amprocfamily | amproclefttype | amprocrighttype | amprocnum |
amproc | amproclefttype
-------+--------------+----------------+-----------------+-----------+--------------+----------------
10052 | 1976 | 23 | 23 | 4 |
btequalimage | integer
but btequalimage returns true unconditionally.
So overall I doubt here BTEQUALIMAGE_PROC flag usage is correct.
On Fri, Oct 18, 2024 at 12:44 PM jian he <jian.universality@gmail.com> wrote:
1. root->parse->resultRelation > 0
just be 100% sure we are only dealing with SELECT, or we can add
Assert at the end of setup_eager_aggregation.
Can GROUP BY clauses be used in INSERT/UPDATE/DELETE/MERGE statements?
If not, I think there is no need to check 'resultRelation > 0', as
setup_eager_aggregation already checks for GROUP BY clauses.
2. join type is FULL JOIN, (i am not sure about other Semijoins and
anti-semijoins types).
The presence of a FULL JOIN does not preclude the use of eager
aggregation. We still can push a partial aggregation down to a level
that is above the FULL JOIN.
3. root->parse->windowClause != NIL
Why does the presence of windowClause prevent the use of eager
aggregation?
lack doc entry in doc/src/sgml/config.sgml
we can put after varlistentry enable_bitmapscan
we can at least mention that
enable_eager_aggregate, The default value is <literal>off</literal>.
Yeah, that's what I need to do.
Thanks
Richard
On Fri, Oct 18, 2024 at 10:22 PM jian he <jian.universality@gmail.com> wrote:
So overall I doubt here BTEQUALIMAGE_PROC flag usage is correct.
The BTEQUALIMAGE_PROC flag is used to prevent eager aggregation for
types whose equality operators do not imply bitwise equality, such as
NUMERIC.
After a second thought, I think it should be OK to just check the
equality operator specified by the SortGroupClause for btree equality.
I’m not very sure about this point, though, and would appreciate any
inputs.
Thanks
Richard
On Wed, Sep 25, 2024 at 3:03 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Sep 11, 2024 at 10:52 AM Tender Wang <tndrwang@gmail.com> wrote:
1. In make_one_rel(), we have the below codes:
/*
* Build grouped base relations for each base rel if possible.
*/
setup_base_grouped_rels(root);As far as I know, each base rel only has one grouped base relation, if possible.
The comments may be changed to "Build a grouped base relation for each base rel if possible."Yeah, each base rel has only one grouped rel. However, there is a
comment nearby stating 'consider_parallel flags for each base rel',
which confuses me about whether it should be singular or plural in
this context. Perhaps someone more proficient in English could
clarify this.
It's not confusing the way you have it, but I think an English teacher
wouldn't like it, because part of the sentence is singular ("each base
rel") and the other part is plural ("grouped base relations").
Tender's proposed rewrite fixes that. Another way to fix it is to
write "Build group relations for base rels where possible".
2. According to the comments of generate_grouped_paths(), we may generate paths for a grouped
relation on top of paths of join relation. So the ”rel_plain" argument in generate_grouped_paths() may be
confused. "plain" usually means "base rel" . How about Re-naming rel_plain to input_rel?I don't think 'plain relation' necessarily means 'base relation'. In
this context I think it can mean 'non-grouped relation'. But maybe
I'm wrong.
We use the term "plain relation" in several different ways. In the
header comments for addFkRecurseReferenced, it means a non-partitioned
relation. In the struct comments for RangeTblEntry, it means any sort
of named thing in pg_class that you can scan, so either a partitioned
or unpartitioned table but not a join or a table function or
something. AFAICT, the most common meaning of "plain relation" is a
pg_class entry where relkind==RELKIND_RELATION.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 24, 2024 at 11:20 PM Richard Guo <guofenglinux@gmail.com> wrote:
The reason is that it is very tricky to set the size estimates for a
grouped join relation. For a non-grouped join relation, we know that
all its paths have the same rowcount estimate (well, in theory). But
this is not true for a grouped join relation. Suppose we have a
grouped join relation for t1/t2 join. There might be two paths for
it:
What exactly do you mean by "well, in theory" here? My understanding
of how things work today is that every relation is supposed to produce
a specific set of rows and every unparameterized path must produce
that set of rows. The order of the rows may vary but the set of rows
may not. With your proposed design here, that's no longer true.
Instead, what's promised is that the row sets will become equivalent
after a later FinalizeAggregate step. In a sense, this is like
parameterization or partial paths. Suppose I have:
SELECT * FROM foo, bar WHERE foo.x = bar.x;
While every unparameterized path for bar has the same row count,
there's also the possibility of performing an index scan on bar.x
parameterized by foo.x, and that path will have a far lower row count
than the unparameterized paths. Instead of producing all the same rows
as every other path, the parameterized path promises only that if run
repeatedly, with all relevant values of foo.x, you'll eventually get
all the same rows you would have gotten from the unparameterized path.
Because of this difference, parameterized paths need special handling
in many different parts of the code.
And the same thing is true of partial paths. They also do not promise
to generate all the same rows -- instead, they promise that when run
simultaneously across multiple workers, the total set of rows returned
across all invocations will be equal to what a normal path would have
produced. Here again, there's a need for special handling because
these paths behave differently than standard paths.
I think what you're doing here is roughly equivalent to either of
these two cases. It's more like the parameterized path case. Instead
of having a path for a relation which is parameterized by some input
parameter, you have a path for a relation, say bar, which is partially
aggregated by some grouping column. But there's no guarantee of how
much partial aggregation has been done. In your example, PartialAgg(t1
JOIN t2) is "more aggregated" than t1 JOIN PartialAgg(t2), so the row
counts are different. This makes me quite nervous. You can't compare a
parameterized path to an unparameterized path, but you can compare it
to another parameterized path if the parameterizations are the same.
You can't compare a partial path to a non-partial path, but you can
compare partial paths to each other. But with this work,
unparameterized, non-partial paths in the same RelOptInfo don't seem
like they are truly comparable. Maybe that's OK, but I'm not sure that
it isn't going to break other things.
You might for example imagine a design where PartialAgg(t1 JOIN t2)
and t1 JOIN PartialAgg(t2) get separate RelOptInfos. After all, there
are probably multiple ways to generate paths for each of those things,
and paths in each category can be compared to each other apples to
apples. What's less clear is whether it's fair to compare across the
two categories, and how many assumptions will be broken by doing so.
I'm not sure that it's right to have separate RelOptInfos; we
definitely don't want to create more RelOptInfos than necessary. At
the same time, if we mix together all of those paths into a single
RelOptInfo, we need to be confident that we're neither going to break
anything nor introduce too many special cases into hot code paths. For
instance, set_joinpath_size() represents an unwelcome complexity
increase that could impact performance generally, even apart from the
cases this patch intends to handle.
It's tempting to wonder if there's some way that we can avoid
generating paths for both PartialAgg(t1 JOIN t2) and t1 JOIN
PartialAgg(t2). Either the former has lower cardinality, or the latter
does. It seems likely that the lower-cardinality set is the winning
strategy. Even if the path has higher cost to generate, we save work
at every subsequent join level and at the final aggregation step. Are
there counterexamples where it's better to use a path from the
higher-cardinality set?
By the way, the work of figuring out what target list should be
produced by partial grouping is done by init_grouping_targets(), but
the comments seem to take it for granted that I know what result we're
trying to produce, and I don't. I think some more high-level
explanation of the goals of this code would be useful. It seems to me
that if I'm looking at a path for an ungrouped relation and it
produces a certain target list, then every column of that target list
is needed somewhere. If those columns are group keys, cool: we pass
those through. If they're inputs to the aggregates, cool: we feed them
to the aggregates. But if they are neither, then what? In the patch,
you either group on those columns or add them to the
possibly_dependent list depending on the result of
is_var_needed_by_join(). I can believe that there are some cases where
we can group on such columns and others where we can't, but find it
difficult to believe that this test reliably distinguishes between
those two cases. If it does, I don't understand why it does. Don't I
need to know something about how those columns are used in the upper
joins? Like, if those columns are connected by a chain of
binary-equality operators back to the user's choice of grouping
columns, that sounds good, but this test doesn't distinguish between
that case and an upper join on the < operator.
create_grouping_expr_infos() does reason based on whether there's an
equal-image operator available, but AIUI that's only reasoning about
the group columns the user mentioned, not the sort of implicit
grouping columns that init_grouping_targets() is creating.
I spent a lot of time thinking today about what makes it safe to push
down grouping or not. I'm not sure that I have a solid answer to that
question even yet. But it seems to me that there are at least two
cases that clearly won't fly. One is the case where the partial
aggregation would merge rows that need to be included in the final
aggregation with rows that should be filtered out. If the
partially-grouped relation simply has a filter qual, that's fine,
because it will be evaluated before the aggregation. But there might
be a qual that has to be evaluated later, either because (a) it
involves another rel, like this_rel.x + that_rel.y > 10 or (b) it
appears in the ON clause of an outer join and thus needs to be
deferred to the level of the OJ, e.g. A LEFT JOIN B ON a.x = b.x AND
b.y = 42. I wonder if you can comment on how these cases are handled.
Perhaps this coding around functional dependencies has something to do
with it, but it isn't clear to me.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Oct 5, 2024 at 11:30 PM Richard Guo <guofenglinux@gmail.com> wrote:
Here’s a comparison of Execution Time and Planning Time for the seven
queries with eager aggregation disabled versus enabled (best of 3).Execution Time:
EAGER-AGG-OFF EAGER-AGG-ON
q4 105787.963 ms 34807.938 ms
q8 1407.454 ms 1654.923 ms
q11 67899.213 ms 18670.086 ms
q23 45945.849 ms 42990.652 ms
q31 10463.536 ms 10244.175 ms
q33 2186.928 ms 2217.228 ms
q77 2360.565 ms 2416.674 ms
Could you attach the EXPLAIN ANALYZE output for these queries, with
and without the patch?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Oct 29, 2024 at 3:57 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Fri, Oct 18, 2024 at 10:22 PM jian he <jian.universality@gmail.com> wrote:
So overall I doubt here BTEQUALIMAGE_PROC flag usage is correct.
The BTEQUALIMAGE_PROC flag is used to prevent eager aggregation for
types whose equality operators do not imply bitwise equality, such as
NUMERIC.After a second thought, I think it should be OK to just check the
equality operator specified by the SortGroupClause for btree equality.
I’m not very sure about this point, though, and would appreciate any
inputs.
Well, the key thing here is the reasoning, which you don't really
spell out either here or in the patch. The patch's justification for
the use of BTEQUALIMAGE_PROC Is that, if we use non-equal-image
operators, "we may lose some information that could be needed to
evaluate upper qual clauses." But there's no explanation of why that
would happen, and I think that makes it difficult to have a good
discussion.
Here's an example from the optimizer README:
# 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
# = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
#
# Identity 3 only holds if predicate Pbc must fail for all-null B rows
# (that is, Pbc is strict for at least one column of B). If Pbc is not
# strict, the first form might produce some rows with nonnull C columns
# where the second form would make those entries null.
This isn't the clearest justification for a rule that I've ever read,
but it's something. Someone reading this can think about whether the
two sentences of justification are a correct argument that Pbc must be
strict in order for the identity to hold.
I think you should be trying to offer similar (or better)
justifications, and perhaps similar identities as well. It's a little
bit hard to think about how to write the identities for this patch,
although you could start with aggregate(X) =
finalizeagg(partialagg(X)) and then explain how the partialagg can be
commuted with certain joins and what is required for it to do so. The
argument needs to be detailed enough that we can evaluate whether it's
correct or not.
Personally, I'm still stuck on the point I wrote about yesterday. When
you partially aggregate a set of rows, the resulting row serves as a
proxy for the original set of rows. So, all of those rows must have
the same destiny. As I mentioned then, if you ever partially aggregate
a row that should ultimately be filtered out with one that shouldn't,
that breaks everything. But also, I need all the rows that feed into
each partial group to be part of the same set of output rows. For
instance, suppose I run a report like this:
SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l
WHERE o.order_id = l.order_id GROUP BY 1;
If I'm thinking of executing this as FinalizeAgg(order JOIN
PartialAgg(order_lines)), what must be true in order for that to be a
valid execution plan? I certainly can't aggregate on some unrelated
column, say part_number. If I do, then first of all I don't know how
to get an order_id column out so that I can still do the join. But
even if that were no issue, you might have two rows with different
values of the order_id column and the same value for part_number, and
that the partial groups that you've created on the order_lines table
don't line up properly with the groups that you need to create on the
orders table. Some particular order_id might need some of the rows
that went into a certain part_number group and not others; that's no
good.
After some thought, I think the process of deduction goes something
like this. We ultimately need to aggregate on a column in the orders
table, but we want to partially aggregate the order_lines table. So we
need a set of columns in the order_lines table that can act as a proxy
for o.order_number. Our proxy should be all the columns that appear in
the join clauses between orders and order_lines. That way, all the
rows in a single partially aggregate row will have the "same" order_id
according to the operator class implementing the = operator, so for a
given row in the "orders" table, either every row in the group has
o.order_id = l.order_id or none of them do; hence they're all part of
the same set of output rows.
It doesn't matter, for example, whether o.order_number is unique. If
it isn't, then we'll flatten together some different rows from the
orders table -- but each of those rows will match all the rows in
partial groupings where o.order_id = partially_grouped_l.order_id and
none of the rows where that isn't the case, so the optimization is
still valid. Likewise it doesn't matter whether o.order_id is unique:
if order_id = 1 occurs twice, then both of the rows will match the
partially aggregated group with order_id = 1, but that's fine. The
only thing that's a problem is if the same row from orders was going
to match some but not all of the partially aggregate rows from some
order_lines group, and that won't happen here.
Now consider this example:
SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l
WHERE o.order_id = l.order_id AND o.something < l.something GROUP BY
1;
Here, we cannot partially group order_lines on just l.order_id,
because we might have a row in the orders table with order_id = 1 and
something = 1 and two rows in the order_lines table that both have
order_id = 1 but one has something = 0 and the other has something =
2. The orders row needs to join to one of those but not the other, so
we can't put them in the same partial group. However, it seems like it
would be legal to group order_lines on (order_id, something), provided
that the operator classes we use for the grouping operation matches
the operator classes of the operators in the join clause - i.e. we
group on order_id using the operator class of = and on something using
the operator class of <. If the operator classes don't match in this
way, then it could happen that the grouping operator thinks the values
are the same but the join operator thinks they're different.
(Everything is also OK if the grouping operator tests
bitwise-equality, because then nothing can ever get merged that
shouldn't; but other notions of equality are also fine as long as
they're compatible with the operator actually used.)
Now let's consider this example, using an imaginary user-defined operator:
SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l
WHERE o.order_id = l.order_id AND o.something ?!?! l.something GROUP
BY 1;
Here, we can partially aggregate order_lines by (order_id, something)
as long as order_id is grouped using bitwise equality OR the same
operator class as the = operator used in the query, and something has
to use bitwise equality.
What about this:
SELECT o.order_number, SUM(l.quantity) FROM orders o LEFT JOIN
order_lines l ON o.order_id = l.order_id AND l.something = 1 GROUP BY
1;
It's really important that we don't try to aggregate on just
l.order_id here. Some rows in the group might have l.something = 1 and
others not. It would be legal (but probably not efficient) to
aggregate order_lines on (order_id, something).
So it seems to me that the general rule here is that when the table
for which we need a surrogate key is directly joined to the table
where the original grouping column is located, we need to group on all
columns involved in the join clause, using either compatible operators
or bitwise equality operators. As a practical matter, we probably only
want to do the optimization when all of the join clauses are
equijoins. Then we don't need to worry about bitwise equality at all,
AFAICS; we just group using the operator class that contains the
operator specified by the user.
What if there are more than two tables involved, like this:
SELECT o.order_number, MAX(n.creation_time) FROM orders o, order_lines
l, order_line_notes n WHERE o.order_id = l.order_id AND o.note_id =
n.note_id GROUP BY 1;
Here, there's no direct connection between the table with the original
grouping column and the table we want to aggregate. Using the rule
mentioned above, we can deduce that l.order_id is a valid grouping
column for order_lines. Applying the same rule again, we can then
deduce that n.note_id is a valid grouping column for note_id. We could
either group note_id on n and then perform the remaining joins; or we
could join order_lines and note_id and then partially aggregate the
result on l.order_id.
What if there are multiple grouping columns, like this:
SELECT foo.x, bar.y, SUM(baz.z) FROM foo, bar, baz WHERE foo.a = baz.a
AND bar.b = baz.b GROUP BY 1, 2;
In this case, we need to find a surrogate grouping column for foo.x
and also a surrogate grouping column for bar.y. We can group baz on a,
b; but not just on a or just on b.
Finally, I think this example is interesting:
SELECT p.title, SUM(c.stuff) FROM parent p, link l, child c WHERE p.x
= l.x AND l.y = c.y AND p.z = c.z GROUP BY 1;
Based on the rule that I articulated earlier, you might think we can
just look at the join clauses between parent and child and group the
child on c.z. However, that's not correct -- we'd have to group on
both c.x and c.z. I'm not sure how to adjust the rule so that it
produces the right answer here.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Oct 29, 2024 at 3:51 AM Richard Guo <guofenglinux@gmail.com> wrote:
2. join type is FULL JOIN, (i am not sure about other Semijoins and
anti-semijoins types).The presence of a FULL JOIN does not preclude the use of eager
aggregation. We still can push a partial aggregation down to a level
that is above the FULL JOIN.
I think you can also push a partial aggregation step through a FULL
JOIN. Consider this:
SELECT p.name, string_agg(c.name, ', ') FROM parents p FULL JOIN
children c ON p.id = c.parent_id GROUP BY 1;
I don't see why it matters here that this is a FULL JOIN. If it were
an inner join, we'd have one group for every parent that has at least
one child. Since it's a full join, we'll also get one group for every
parent with no children, and also one group for the null parent. But
that should work fine with a partially aggregated c.
--
Robert Haas
EDB: http://www.enterprisedb.com
hi.
still trying to understand v13. found a bug.
minimum test :
drop table if exists t1, t2;
CREATE TABLE t1 (a int, b int, c int);
CREATE TABLE t2 (a int, b int, c int);
SET enable_eager_aggregate TO on;
explain(costs off, settings) SELECT avg(t2.a), t1.c FROM t1 JOIN t2 ON
t1.b = t2.b GROUP BY t1.c having grouping(t1.c) > 0;
create_agg_clause_infos
foreach(lc, tlist_exprs)
{
Expr *expr = (Expr *) lfirst(lc);
if (IsA(expr, GroupingFunc))
return;
}
if (root->parse->havingQual != NULL)
{
List *having_exprs;
having_exprs = pull_var_clause((Node *) root->parse->havingQual,
PVC_INCLUDE_AGGREGATES |
PVC_RECURSE_PLACEHOLDERS);
if (having_exprs != NIL)
{
tlist_exprs = list_concat(tlist_exprs, having_exprs);
list_free(having_exprs);
}
}
havingQual can have GroupingFunc.
if that happens, then segmentation fault.
On Tue, Oct 29, 2024 at 9:59 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 25, 2024 at 3:03 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Sep 11, 2024 at 10:52 AM Tender Wang <tndrwang@gmail.com> wrote:
1. In make_one_rel(), we have the below codes:
/*
* Build grouped base relations for each base rel if possible.
*/
setup_base_grouped_rels(root);As far as I know, each base rel only has one grouped base relation, if possible.
The comments may be changed to "Build a grouped base relation for each base rel if possible."Yeah, each base rel has only one grouped rel. However, there is a
comment nearby stating 'consider_parallel flags for each base rel',
which confuses me about whether it should be singular or plural in
this context. Perhaps someone more proficient in English could
clarify this.It's not confusing the way you have it, but I think an English teacher
wouldn't like it, because part of the sentence is singular ("each base
rel") and the other part is plural ("grouped base relations").
Tender's proposed rewrite fixes that. Another way to fix it is to
write "Build group relations for base rels where possible".
Thank you for the suggestion. The new wording looks much better
grammatically. It seems to me that we should address the nearby
comment too, which goes like "consider_parallel flags for each base
rel", as each rel has only one consider_parallel flag.
2. According to the comments of generate_grouped_paths(), we may generate paths for a grouped
relation on top of paths of join relation. So the ”rel_plain" argument in generate_grouped_paths() may be
confused. "plain" usually means "base rel" . How about Re-naming rel_plain to input_rel?I don't think 'plain relation' necessarily means 'base relation'. In
this context I think it can mean 'non-grouped relation'. But maybe
I'm wrong.We use the term "plain relation" in several different ways. In the
header comments for addFkRecurseReferenced, it means a non-partitioned
relation. In the struct comments for RangeTblEntry, it means any sort
of named thing in pg_class that you can scan, so either a partitioned
or unpartitioned table but not a join or a table function or
something. AFAICT, the most common meaning of "plain relation" is a
pg_class entry where relkind==RELKIND_RELATION.
Agreed.
Thanks
Richard
On Wed, Oct 30, 2024 at 5:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 24, 2024 at 11:20 PM Richard Guo <guofenglinux@gmail.com> wrote:
The reason is that it is very tricky to set the size estimates for a
grouped join relation. For a non-grouped join relation, we know that
all its paths have the same rowcount estimate (well, in theory). But
this is not true for a grouped join relation. Suppose we have a
grouped join relation for t1/t2 join. There might be two paths for
it:What exactly do you mean by "well, in theory" here? My understanding
of how things work today is that every relation is supposed to produce
a specific set of rows and every unparameterized path must produce
that set of rows. The order of the rows may vary but the set of rows
may not. With your proposed design here, that's no longer true.
Instead, what's promised is that the row sets will become equivalent
after a later FinalizeAggregate step. In a sense, this is like
parameterization or partial paths.
Yeah, you're correct that currently each relation is expected to
produce the same specific set of rows. When I say "well, in theory" I
mean that for a join relation, all its unparameterized paths should
theoretically have the same row count estimate. However, in practice,
because there are more than one way to make a joinrel for more than
two base relations, and the selectivity estimation routines don’t
handle all cases equally well, we might get different row count
estimates depending on the pair provided.
And yes, with the grouped relations proposed in this patch, the
situation changes. For a grouped join relation, different paths can
produce very different row sets, depending on where the partial
aggregation is placed in the path tree. This is also why we need to
recalculate the row count estimate for a grouped join path using its
outer and inner paths provided, rather than using path->parent->rows
directly. This is very like the parameterized path case.
I think what you're doing here is roughly equivalent to either of
these two cases. It's more like the parameterized path case. Instead
of having a path for a relation which is parameterized by some input
parameter, you have a path for a relation, say bar, which is partially
aggregated by some grouping column. But there's no guarantee of how
much partial aggregation has been done. In your example, PartialAgg(t1
JOIN t2) is "more aggregated" than t1 JOIN PartialAgg(t2), so the row
counts are different. This makes me quite nervous. You can't compare a
parameterized path to an unparameterized path, but you can compare it
to another parameterized path if the parameterizations are the same.
You can't compare a partial path to a non-partial path, but you can
compare partial paths to each other. But with this work,
unparameterized, non-partial paths in the same RelOptInfo don't seem
like they are truly comparable. Maybe that's OK, but I'm not sure that
it isn't going to break other things.
Perhaps we could introduce a GroupPathInfo into the Path structure to
store common information for a grouped path, such as the location of
the partial aggregation (which could be the relids of the relation on
top of which we place the partial aggregation) and the estimated
rowcount for this grouped path, similar to how ParamPathInfo functions
for parameterized paths. Then we should be able to compare the
grouped paths of the same location apples to apples. I haven’t
thought this through in detail yet, though.
It's tempting to wonder if there's some way that we can avoid
generating paths for both PartialAgg(t1 JOIN t2) and t1 JOIN
PartialAgg(t2). Either the former has lower cardinality, or the latter
does. It seems likely that the lower-cardinality set is the winning
strategy. Even if the path has higher cost to generate, we save work
at every subsequent join level and at the final aggregation step. Are
there counterexamples where it's better to use a path from the
higher-cardinality set?
This is very appealing if we can retain only the lower-cardinality
path, as it would greatly simplify the overall design. I haven't
looked for counterexamples yet, but I plan to do so later.
By the way, the work of figuring out what target list should be
produced by partial grouping is done by init_grouping_targets(), but
the comments seem to take it for granted that I know what result we're
trying to produce, and I don't. I think some more high-level
explanation of the goals of this code would be useful. It seems to me
that if I'm looking at a path for an ungrouped relation and it
produces a certain target list, then every column of that target list
is needed somewhere. If those columns are group keys, cool: we pass
those through. If they're inputs to the aggregates, cool: we feed them
to the aggregates. But if they are neither, then what? In the patch,
you either group on those columns or add them to the
possibly_dependent list depending on the result of
is_var_needed_by_join(). I can believe that there are some cases where
we can group on such columns and others where we can't, but find it
difficult to believe that this test reliably distinguishes between
those two cases. If it does, I don't understand why it does. Don't I
need to know something about how those columns are used in the upper
joins? Like, if those columns are connected by a chain of
binary-equality operators back to the user's choice of grouping
columns, that sounds good, but this test doesn't distinguish between
that case and an upper join on the < operator.
create_grouping_expr_infos() does reason based on whether there's an
equal-image operator available, but AIUI that's only reasoning about
the group columns the user mentioned, not the sort of implicit
grouping columns that init_grouping_targets() is creating.
Yeah, this patch does not get it correct here. Basically the logic is
that for the partial aggregation pushed down to a non-aggregated
relation, we need to consider all columns of that relation involved in
upper join clauses and include them in the grouping keys. Currently,
the patch only checks whether a column is involved in upper join
clauses but does not verify how the column is used. We need to ensure
that the operator used in the join clause is at least compatible with
the grouping operator; otherwise, the grouping operator might
interpret the values as the same while the join operator sees them as
different.
Thanks
Richard
On Thu, Oct 31, 2024 at 12:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
Well, the key thing here is the reasoning, which you don't really
spell out either here or in the patch. The patch's justification for
the use of BTEQUALIMAGE_PROC Is that, if we use non-equal-image
operators, "we may lose some information that could be needed to
evaluate upper qual clauses." But there's no explanation of why that
would happen, and I think that makes it difficult to have a good
discussion.Here's an example from the optimizer README:
# 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
# = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
#
# Identity 3 only holds if predicate Pbc must fail for all-null B rows
# (that is, Pbc is strict for at least one column of B). If Pbc is not
# strict, the first form might produce some rows with nonnull C columns
# where the second form would make those entries null.This isn't the clearest justification for a rule that I've ever read,
but it's something. Someone reading this can think about whether the
two sentences of justification are a correct argument that Pbc must be
strict in order for the identity to hold.I think you should be trying to offer similar (or better)
justifications, and perhaps similar identities as well. It's a little
bit hard to think about how to write the identities for this patch,
although you could start with aggregate(X) =
finalizeagg(partialagg(X)) and then explain how the partialagg can be
commuted with certain joins and what is required for it to do so. The
argument needs to be detailed enough that we can evaluate whether it's
correct or not.Personally, I'm still stuck on the point I wrote about yesterday. When
you partially aggregate a set of rows, the resulting row serves as a
proxy for the original set of rows. So, all of those rows must have
the same destiny. As I mentioned then, if you ever partially aggregate
a row that should ultimately be filtered out with one that shouldn't,
that breaks everything. But also, I need all the rows that feed into
each partial group to be part of the same set of output rows. For
instance, suppose I run a report like this:SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l
WHERE o.order_id = l.order_id GROUP BY 1;If I'm thinking of executing this as FinalizeAgg(order JOIN
PartialAgg(order_lines)), what must be true in order for that to be a
valid execution plan? I certainly can't aggregate on some unrelated
column, say part_number. If I do, then first of all I don't know how
to get an order_id column out so that I can still do the join. But
even if that were no issue, you might have two rows with different
values of the order_id column and the same value for part_number, and
that the partial groups that you've created on the order_lines table
don't line up properly with the groups that you need to create on the
orders table. Some particular order_id might need some of the rows
that went into a certain part_number group and not others; that's no
good.After some thought, I think the process of deduction goes something
like this. We ultimately need to aggregate on a column in the orders
table, but we want to partially aggregate the order_lines table. So we
need a set of columns in the order_lines table that can act as a proxy
for o.order_number. Our proxy should be all the columns that appear in
the join clauses between orders and order_lines. That way, all the
rows in a single partially aggregate row will have the "same" order_id
according to the operator class implementing the = operator, so for a
given row in the "orders" table, either every row in the group has
o.order_id = l.order_id or none of them do; hence they're all part of
the same set of output rows.It doesn't matter, for example, whether o.order_number is unique. If
it isn't, then we'll flatten together some different rows from the
orders table -- but each of those rows will match all the rows in
partial groupings where o.order_id = partially_grouped_l.order_id and
none of the rows where that isn't the case, so the optimization is
still valid. Likewise it doesn't matter whether o.order_id is unique:
if order_id = 1 occurs twice, then both of the rows will match the
partially aggregated group with order_id = 1, but that's fine. The
only thing that's a problem is if the same row from orders was going
to match some but not all of the partially aggregate rows from some
order_lines group, and that won't happen here.Now consider this example:
SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l
WHERE o.order_id = l.order_id AND o.something < l.something GROUP BY
1;Here, we cannot partially group order_lines on just l.order_id,
because we might have a row in the orders table with order_id = 1 and
something = 1 and two rows in the order_lines table that both have
order_id = 1 but one has something = 0 and the other has something =
2. The orders row needs to join to one of those but not the other, so
we can't put them in the same partial group. However, it seems like it
would be legal to group order_lines on (order_id, something), provided
that the operator classes we use for the grouping operation matches
the operator classes of the operators in the join clause - i.e. we
group on order_id using the operator class of = and on something using
the operator class of <. If the operator classes don't match in this
way, then it could happen that the grouping operator thinks the values
are the same but the join operator thinks they're different.
(Everything is also OK if the grouping operator tests
bitwise-equality, because then nothing can ever get merged that
shouldn't; but other notions of equality are also fine as long as
they're compatible with the operator actually used.)Now let's consider this example, using an imaginary user-defined operator:
SELECT o.order_number, SUM(l.quantity) FROM orders o, order_lines l
WHERE o.order_id = l.order_id AND o.something ?!?! l.something GROUP
BY 1;Here, we can partially aggregate order_lines by (order_id, something)
as long as order_id is grouped using bitwise equality OR the same
operator class as the = operator used in the query, and something has
to use bitwise equality.What about this:
SELECT o.order_number, SUM(l.quantity) FROM orders o LEFT JOIN
order_lines l ON o.order_id = l.order_id AND l.something = 1 GROUP BY
1;It's really important that we don't try to aggregate on just
l.order_id here. Some rows in the group might have l.something = 1 and
others not. It would be legal (but probably not efficient) to
aggregate order_lines on (order_id, something).So it seems to me that the general rule here is that when the table
for which we need a surrogate key is directly joined to the table
where the original grouping column is located, we need to group on all
columns involved in the join clause, using either compatible operators
or bitwise equality operators. As a practical matter, we probably only
want to do the optimization when all of the join clauses are
equijoins. Then we don't need to worry about bitwise equality at all,
AFAICS; we just group using the operator class that contains the
operator specified by the user.What if there are more than two tables involved, like this:
SELECT o.order_number, MAX(n.creation_time) FROM orders o, order_lines
l, order_line_notes n WHERE o.order_id = l.order_id AND o.note_id =
n.note_id GROUP BY 1;Here, there's no direct connection between the table with the original
grouping column and the table we want to aggregate. Using the rule
mentioned above, we can deduce that l.order_id is a valid grouping
column for order_lines. Applying the same rule again, we can then
deduce that n.note_id is a valid grouping column for note_id. We could
either group note_id on n and then perform the remaining joins; or we
could join order_lines and note_id and then partially aggregate the
result on l.order_id.What if there are multiple grouping columns, like this:
SELECT foo.x, bar.y, SUM(baz.z) FROM foo, bar, baz WHERE foo.a = baz.a
AND bar.b = baz.b GROUP BY 1, 2;In this case, we need to find a surrogate grouping column for foo.x
and also a surrogate grouping column for bar.y. We can group baz on a,
b; but not just on a or just on b.Finally, I think this example is interesting:
SELECT p.title, SUM(c.stuff) FROM parent p, link l, child c WHERE p.x
= l.x AND l.y = c.y AND p.z = c.z GROUP BY 1;Based on the rule that I articulated earlier, you might think we can
just look at the join clauses between parent and child and group the
child on c.z. However, that's not correct -- we'd have to group on
both c.x and c.z. I'm not sure how to adjust the rule so that it
produces the right answer here.
Thank you for the thorough deduction process; this is something I
should have done before proposing the patch. As we discussed
off-list, what I need to do next is to establish a theory that proves
the transformation proposed in this patch is correct in all cases.
What I have in mind now is that to push a partial aggregation down to
a relation, we need to group by all the columns of that relation
involved in the upper join clauses, using compatible operators. This
is essential to ensure that an aggregated row from the partial
aggregation matches the other side of the join if and only if each row
in the partial group does, thereby ensuring that all rows in the same
partial group have the same 'destiny'.
Thanks
Richard
On Thu, Oct 31, 2024 at 9:16 PM jian he <jian.universality@gmail.com> wrote:
hi.
still trying to understand v13. found a bug.minimum test :
drop table if exists t1, t2;
CREATE TABLE t1 (a int, b int, c int);
CREATE TABLE t2 (a int, b int, c int);
SET enable_eager_aggregate TO on;
explain(costs off, settings) SELECT avg(t2.a), t1.c FROM t1 JOIN t2 ON
t1.b = t2.b GROUP BY t1.c having grouping(t1.c) > 0;havingQual can have GroupingFunc.
if that happens, then segmentation fault.
Nice catch. Thanks.
create_agg_clause_infos does check for GROUPING() expressions, but
not in the right place. Will fix it in the next version.
Thanks
Richard
On Fri, Nov 1, 2024 at 2:21 AM Richard Guo <guofenglinux@gmail.com> wrote:
... an aggregated row from the partial
aggregation matches the other side of the join if and only if each row
in the partial group does, thereby ensuring that all rows in the same
partial group have the same 'destiny'.
Ah, I really like this turn of phrase! I think it's clearer and
simpler than what I said. And I think it implies that we don't need to
explicitly deduce surrogate grouping keys. For example if we have A
JOIN B JOIN C JOIN D JOIN E JOIN F, grouped by columns from A, we
don't need to work out surrogate grouping keys for B and then C and
then D and then E and then F. We can just look at F's join clauses and
that tells us how to group, independent of anything else.
But is there any hole in that approach? I think the question is
whether the current row could be used in some way that doesn't show up
in the join clauses. I can't think of any way for that to happen,
really. I believe that any outerjoin-delayed quals will show up as
join clauses, and stuff like ORDER BY and HAVING will happen after the
aggregation (at least logically) so it should be fine. Windowed
functions and ordered aggregates may be a blocker to the optimization,
though: if the window function needs access to the unaggregated rows,
or even just needs to know how many rows there are, then we'd better
not aggregate them before the window function runs; and if the
aggregate is ordered, we can only partially aggregate the data if it
is already ordered in a way that is compatible with the final, desired
ordering. Another case we might need to watch out for is RLS. RLS
wants to apply all the security quals before any non-leakproof
functions, and pushing down the aggregation might push an aggregate
function past security quals. Perhaps there are other cases to worry
about as well; this is all I can think of at the moment.
But regardless of those kinds of cases, the basic idea that we want
the partially aggregate rows to join if and only if the unaggregated
rows would have joined seems exactly correct to me, and that provides
theoretical justification for deriving the surrogate grouping key
directly from the join quals. Woot!
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Aug 29, 2024 at 10:26 AM Richard Guo <guofenglinux@gmail.com> wrote:
2. I think there might be techniques we could use to limit planning
effort at an earlier stage when the approach doesn't appear promising.
For example, if the proposed grouping column is already unique, the
exercise is pointless (I think). Ideally we'd like to detect that
without even creating the grouped_rel. But the proposed grouping
column might also be *mostly* unique. For example, consider a table
with a million rows and a column 500,000 distinct values. I suspect it
will be difficult for partial aggregation to work out to a win in a
case like this, because I think that the cost of performing the
partial aggregation will not reduce the cost either of the final
aggregation or of the intervening join steps by enough to compensate.
It would be best to find a way to avoid generating a lot of rels and
paths in cases where there's really not much hope of a win.One could, perhaps, imagine going further with this by postponing
eager aggregation planning until after regular paths have been built,
so that we have good cardinality estimates. Suppose the query joins a
single fact table to a series of dimension tables. The final plan thus
uses the fact table as the driving table and joins to the dimension
tables one by one. Do we really need to consider partial aggregation
at every level? Perhaps just where there's been a significant row
count reduction since the last time we tried it, but at the next level
the row count will increase again?Maybe there are other heuristics we could use in addition or instead.
Yeah, one of my concerns with this work is that it can use
significantly more CPU time and memory during planning once enabled.
It would be great if we have some efficient heuristics to limit the
effort. I'll work on that next and see what happens.
in v13, latest version. we can
/* ... and initialize these targets */
if (!init_grouping_targets(root, rel, target, agg_input,
&group_clauses, &group_exprs))
return NULL;
if (rel->reloptkind == RELOPT_BASEREL && group_exprs != NIL)
{
foreach_node(Var, var, group_exprs)
{
if(var->varno == rel->relid &&
has_unique_index(rel, var->varattno))
return NULL;
}
}
since in init_grouping_targets we already Asserted that group_exprs is
a list of Var.
--------------------------------------------------------------------------------
also in create_rel_agg_info, estimate_num_groups
result->group_exprs = group_exprs;
result->grouped_rows = estimate_num_groups(root, result->group_exprs,
rel->rows, NULL, NULL);
/*
* The grouped paths for the given relation are considered useful iff
* the row reduction ratio is greater than EAGER_AGGREGATE_RATIO.
*/
agg_info->agg_useful =
(agg_info->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
If the associated Var in group_exprs is too many, then result->grouped_rows
will be less accurate, therefore agg_info->agg_useful will be less accurate.
should we limit the number of Var associated with Var group_exprs.
for example:
SET enable_eager_aggregate TO on;
drop table if exists eager_agg_t1,eager_agg_t2, eager_agg_t3;
CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
INSERT INTO eager_agg_t1 SELECT i % 100, i, i FROM generate_series(1, 5)i;
INSERT INTO eager_agg_t2 SELECT i % 10, i, i FROM generate_series(1, 5)i;
INSERT INTO eager_agg_t2 SELECT i % 10, i, i FROM generate_series(-4, -2)i;
explain(costs off, verbose, settings)
SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON
abs(t1.b) = abs(t2.b % 10 + t2.a) group by 1;
explain(costs off, verbose, settings)
SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON
abs(t1.b) = abs(t2.b % 10 + t2.a) group by 1;
QUERY PLAN
--------------------------------------------------------------------------------------
Finalize HashAggregate
Output: t1.a, avg(t2.c)
Group Key: t1.a
-> Merge Join
Output: t1.a, (PARTIAL avg(t2.c))
Merge Cond: ((abs(((t2.b % 10) + t2.a))) = (abs(t1.b)))
-> Sort
Output: t2.b, t2.a, (PARTIAL avg(t2.c)), (abs(((t2.b %
10) + t2.a)))
Sort Key: (abs(((t2.b % 10) + t2.a)))
-> Partial HashAggregate
Output: t2.b, t2.a, PARTIAL avg(t2.c), abs(((t2.b
% 10) + t2.a))
Group Key: t2.b, t2.a
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
-> Sort
Output: t1.a, t1.b, (abs(t1.b))
Sort Key: (abs(t1.b))
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b, abs(t1.b)
Settings: enable_eager_aggregate = 'on'
Query Identifier: -734044107933323262
On Fri, Nov 1, 2024 at 9:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Nov 1, 2024 at 2:21 AM Richard Guo <guofenglinux@gmail.com> wrote:
... an aggregated row from the partial
aggregation matches the other side of the join if and only if each row
in the partial group does, thereby ensuring that all rows in the same
partial group have the same 'destiny'.Ah, I really like this turn of phrase! I think it's clearer and
simpler than what I said. And I think it implies that we don't need to
explicitly deduce surrogate grouping keys. For example if we have A
JOIN B JOIN C JOIN D JOIN E JOIN F, grouped by columns from A, we
don't need to work out surrogate grouping keys for B and then C and
then D and then E and then F. We can just look at F's join clauses and
that tells us how to group, independent of anything else.But is there any hole in that approach? I think the question is
whether the current row could be used in some way that doesn't show up
in the join clauses. I can't think of any way for that to happen,
really. I believe that any outerjoin-delayed quals will show up as
join clauses, and stuff like ORDER BY and HAVING will happen after the
aggregation (at least logically) so it should be fine. Windowed
functions and ordered aggregates may be a blocker to the optimization,
though: if the window function needs access to the unaggregated rows,
or even just needs to know how many rows there are, then we'd better
not aggregate them before the window function runs; and if the
aggregate is ordered, we can only partially aggregate the data if it
is already ordered in a way that is compatible with the final, desired
ordering. Another case we might need to watch out for is RLS. RLS
wants to apply all the security quals before any non-leakproof
functions, and pushing down the aggregation might push an aggregate
function past security quals. Perhaps there are other cases to worry
about as well; this is all I can think of at the moment.
Yeah, ordered aggregates could be a blocker. I think it might be best
to prevent the use of eager aggregation if root->numOrderedAggs > 0
for now.
I've been thinking about the window functions case, as Jian He also
mentioned it some time ago. It seems that the window function's
argument(s), as well as the PARTITION BY expression(s), are supposed
to appear in the GROUP BY clause or be used in an aggregate function.
And window functions are applied after the aggregation. So it seems
that there is no problem with window functions. But maybe I'm wrong.
I hadn't considered the RLS case before, but I think you're right. To
simplify things, maybe for now we can just prevent pushing down the
aggregation if the query applies some RLS policy, by checking
query->hasRowSecurity.
But regardless of those kinds of cases, the basic idea that we want
the partially aggregate rows to join if and only if the unaggregated
rows would have joined seems exactly correct to me, and that provides
theoretical justification for deriving the surrogate grouping key
directly from the join quals. Woot!
Thank you for the confirmation!
Thanks
Richard
On Wed, Nov 6, 2024 at 3:22 AM Richard Guo <guofenglinux@gmail.com> wrote:
Yeah, ordered aggregates could be a blocker. I think it might be best
to prevent the use of eager aggregation if root->numOrderedAggs > 0
for now.I've been thinking about the window functions case, as Jian He also
mentioned it some time ago. It seems that the window function's
argument(s), as well as the PARTITION BY expression(s), are supposed
to appear in the GROUP BY clause or be used in an aggregate function.
And window functions are applied after the aggregation. So it seems
that there is no problem with window functions. But maybe I'm wrong.I hadn't considered the RLS case before, but I think you're right. To
simplify things, maybe for now we can just prevent pushing down the
aggregation if the query applies some RLS policy, by checking
query->hasRowSecurity.
Particularly for the RLS case, I think we should be reluctant to
disable the optimization entirely just because there might be a
problem. We have existing infrastructure to keep security quals from
being applied too late, and I suspect it's mostly applicable to this
situation. Therefore, I suspect it might not be very much work to
allow this optimization even when RLS is in use, as long as it
wouldn't actually cause a violation of the RLS rules. If, upon
investigation, you find some reason why we can't assess accurately
whether pushing down a specific aggregate is a problem, then the
approach that you propose is reasonable, but I think the question
should be investigated. I don't like the idea of giving up on
RLS-using queries completely without even trying to figure out how
difficult it would be to do the right thing.
I have similar but weaker feelings about ordered aggregates. Consider:
explain select t1.id, array_agg(t2.v order by t3.o) from t1, t2, t3
where t1.id = t2.id and t2.id = t3.id group by 1;
We can't partially aggregate t2, but we could partially aggregate t2
join t3. So this case is a lot like:
explain select t1.id, array_agg(t2.v + t3.o) from t1, t2, t3 where
t1.id = t2.id and t2.id = t3.id group by 1;
I don't know whether the patch handles the second case correctly right
now, but that certainly seems like a case that has to work. We must be
able to determine in such a case that the partial aggregate has to be
above the t2-t3 join. And if we can determine that, then why can't
basically the same logic handle the first case? There are certainly
some differences. The first case not only needs the aggregate to be
above the t2-t3 join but also needs the input data to be sorted, so we
don't get the right behavior for ordered aggregates just by using the
contents of the ORDER BY clause to decide at what level the partial
aggregate can be applied. On the other hand, if we're looking at paths
for (T2 JOIN T3) to build paths for PartialAgg(T2 join T3), the
stipulation that we need to use ordered paths or sorting doesn't make
the code very much more complicated. I'm open to the conclusion that
this is too much complexity but I'd rather not dismiss it instantly.
Regarding window functions, you've said a few times now that you don't
see the problem, but the more I think about it, the more obvious it
seems to me that there are BIG problems. Consider this example from
the documentation:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname)
FROM empsalary;
I get a query plan like this:
WindowAgg (cost=83.46..104.37 rows=1200 width=72)
-> Sort (cost=83.37..86.37 rows=1200 width=40)
Sort Key: depname
-> Seq Scan on empsalary (cost=0.00..22.00 rows=1200 width=40)
Already we see warning signs here. The WindowAgg node needs the input
rows to be ordered, because it's going to average the salary for each
group of rows with the same depname. So we have the same kinds of
issues that we do for ordered aggregates, at the very least. But
window aggregates are not just ordering-sensitive. They are also
empowered to look at other rows in the frame. Consider the following
example:
create table names (n text);
insert into names values ('Tom'), ('Dick'), ('Harry');
select n, lag(n, 1) over () from names;
The result is:
n | lag
-------+------
Tom |
Dick | Tom
Harry | Dick
I think it is pretty obvious that if any form of partial aggregation
had been applied here, it would be impossible to correctly evaluate
lag(). Or am I missing something?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Nov 6, 2024 at 11:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 6, 2024 at 3:22 AM Richard Guo <guofenglinux@gmail.com> wrote:
Yeah, ordered aggregates could be a blocker. I think it might be best
to prevent the use of eager aggregation if root->numOrderedAggs > 0
for now.I've been thinking about the window functions case, as Jian He also
mentioned it some time ago. It seems that the window function's
argument(s), as well as the PARTITION BY expression(s), are supposed
to appear in the GROUP BY clause or be used in an aggregate function.
And window functions are applied after the aggregation. So it seems
that there is no problem with window functions. But maybe I'm wrong.I hadn't considered the RLS case before, but I think you're right. To
simplify things, maybe for now we can just prevent pushing down the
aggregation if the query applies some RLS policy, by checking
query->hasRowSecurity.Particularly for the RLS case, I think we should be reluctant to
disable the optimization entirely just because there might be a
problem. We have existing infrastructure to keep security quals from
being applied too late, and I suspect it's mostly applicable to this
situation. Therefore, I suspect it might not be very much work to
allow this optimization even when RLS is in use, as long as it
wouldn't actually cause a violation of the RLS rules. If, upon
investigation, you find some reason why we can't assess accurately
whether pushing down a specific aggregate is a problem, then the
approach that you propose is reasonable, but I think the question
should be investigated. I don't like the idea of giving up on
RLS-using queries completely without even trying to figure out how
difficult it would be to do the right thing.
That makes sense. I shouldn’t be lazy and simply disable this
optimization for the RLS case. I'm not familiar with the RLS stuff
but I’ll take some time to investigate it further.
I have similar but weaker feelings about ordered aggregates. Consider:
explain select t1.id, array_agg(t2.v order by t3.o) from t1, t2, t3
where t1.id = t2.id and t2.id = t3.id group by 1;We can't partially aggregate t2, but we could partially aggregate t2
join t3. So this case is a lot like:explain select t1.id, array_agg(t2.v + t3.o) from t1, t2, t3 where
t1.id = t2.id and t2.id = t3.id group by 1;I don't know whether the patch handles the second case correctly right
now, but that certainly seems like a case that has to work. We must be
able to determine in such a case that the partial aggregate has to be
above the t2-t3 join. And if we can determine that, then why can't
basically the same logic handle the first case? There are certainly
some differences. The first case not only needs the aggregate to be
above the t2-t3 join but also needs the input data to be sorted, so we
don't get the right behavior for ordered aggregates just by using the
contents of the ORDER BY clause to decide at what level the partial
aggregate can be applied. On the other hand, if we're looking at paths
for (T2 JOIN T3) to build paths for PartialAgg(T2 join T3), the
stipulation that we need to use ordered paths or sorting doesn't make
the code very much more complicated. I'm open to the conclusion that
this is too much complexity but I'd rather not dismiss it instantly.
It seems to me that a partially aggregated row might need to be
combined with other partially aggregated rows after the join, if they
belong to the same t1.id group. IIUC, this implies that we cannot
perform partial aggregation on ordered input before the join,
otherwise we may get incorrect results during the final aggregation
phase.
Regarding window functions, you've said a few times now that you don't
see the problem, but the more I think about it, the more obvious it
seems to me that there are BIG problems. Consider this example from
the documentation:SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname)
FROM empsalary;I get a query plan like this:
WindowAgg (cost=83.46..104.37 rows=1200 width=72)
-> Sort (cost=83.37..86.37 rows=1200 width=40)
Sort Key: depname
-> Seq Scan on empsalary (cost=0.00..22.00 rows=1200 width=40)Already we see warning signs here. The WindowAgg node needs the input
rows to be ordered, because it's going to average the salary for each
group of rows with the same depname. So we have the same kinds of
issues that we do for ordered aggregates, at the very least. But
window aggregates are not just ordering-sensitive. They are also
empowered to look at other rows in the frame. Consider the following
example:create table names (n text);
insert into names values ('Tom'), ('Dick'), ('Harry');
select n, lag(n, 1) over () from names;The result is:
n | lag
-------+------
Tom |
Dick | Tom
Harry | DickI think it is pretty obvious that if any form of partial aggregation
had been applied here, it would be impossible to correctly evaluate
lag(). Or am I missing something?
Hmm, currently we only consider grouped aggregation for eager
aggregation. For grouped aggregation, the window function's
arguments, as well as the PARTITION BY expressions, must appear in the
GROUP BY clause. That is to say, the depname column in the first
query, or the n column in the second query, will not be aggregated
into the partial groups. Instead, they will remain as they are as
input for the WindowAgg nodes. It seems to me that this ensures
that we're good with window functions. But maybe I'm wrong.
Thanks
Richard
On Sun, Nov 10, 2024 at 7:52 PM Richard Guo <guofenglinux@gmail.com> wrote:
I have similar but weaker feelings about ordered aggregates. Consider:
explain select t1.id, array_agg(t2.v order by t3.o) from t1, t2, t3
where t1.id = t2.id and t2.id = t3.id group by 1;It seems to me that a partially aggregated row might need to be
combined with other partially aggregated rows after the join, if they
belong to the same t1.id group. IIUC, this implies that we cannot
perform partial aggregation on ordered input before the join,
otherwise we may get incorrect results during the final aggregation
phase.
Hmm, I think you're right. I think that if the t1.id=t2.id join is one
to one, then it would work out fine, but that need not be the case.
Hmm, currently we only consider grouped aggregation for eager
aggregation. For grouped aggregation, the window function's
arguments, as well as the PARTITION BY expressions, must appear in the
GROUP BY clause. That is to say, the depname column in the first
query, or the n column in the second query, will not be aggregated
into the partial groups. Instead, they will remain as they are as
input for the WindowAgg nodes. It seems to me that this ensures
that we're good with window functions. But maybe I'm wrong.
Unfortunately, I don't know what you mean by grouped aggregation. I
think of grouping and aggregation as synonyms, pretty much.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Nov 12, 2024 at 1:30 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Nov 10, 2024 at 7:52 PM Richard Guo <guofenglinux@gmail.com> wrote:
Hmm, currently we only consider grouped aggregation for eager
aggregation. For grouped aggregation, the window function's
arguments, as well as the PARTITION BY expressions, must appear in the
GROUP BY clause. That is to say, the depname column in the first
query, or the n column in the second query, will not be aggregated
into the partial groups. Instead, they will remain as they are as
input for the WindowAgg nodes. It seems to me that this ensures
that we're good with window functions. But maybe I'm wrong.Unfortunately, I don't know what you mean by grouped aggregation. I
think of grouping and aggregation as synonyms, pretty much.
Ah, sorry for the confusion. By "grouped aggregation", I mean
aggregation with a GROUP BY clause, where we produce a result row for
each group. This contrasts with plain aggregation, where there is a
single result row for the whole query.
Thanks
Richard
On Sun, Nov 10, 2024 at 7:52 PM Richard Guo <guofenglinux@gmail.com> wrote:
Hmm, currently we only consider grouped aggregation for eager
aggregation. For grouped aggregation, the window function's
arguments, as well as the PARTITION BY expressions, must appear in the
GROUP BY clause. That is to say, the depname column in the first
query, or the n column in the second query, will not be aggregated
into the partial groups. Instead, they will remain as they are as
input for the WindowAgg nodes. It seems to me that this ensures
that we're good with window functions. But maybe I'm wrong.
Returning to this point now that I understand what you meant by
grouped aggregation:
I still don't understand how you expect to be able to evaluate
functions like LEAD() and LAG() if any form of partial aggregation has
been done.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Dec 4, 2024 at 11:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Nov 10, 2024 at 7:52 PM Richard Guo <guofenglinux@gmail.com> wrote:
Hmm, currently we only consider grouped aggregation for eager
aggregation. For grouped aggregation, the window function's
arguments, as well as the PARTITION BY expressions, must appear in the
GROUP BY clause. That is to say, the depname column in the first
query, or the n column in the second query, will not be aggregated
into the partial groups. Instead, they will remain as they are as
input for the WindowAgg nodes. It seems to me that this ensures
that we're good with window functions. But maybe I'm wrong.Returning to this point now that I understand what you meant by
grouped aggregation:I still don't understand how you expect to be able to evaluate
functions like LEAD() and LAG() if any form of partial aggregation has
been done.
In grouped aggregation, the non-aggregate arguments of the window
function must appear in the GROUP BY clause, so they will not be
aggregated into the partial groups. It seems to me that this ensures
that they remain available as valid inputs for the window function.
For the Aggref arguments of the window function, their final values
are calculated in the Finalize Agg node, meaning they, too, are good
to be used as inputs for the window function.
As an example, please consider
create table tbl (a int, b int, c int);
insert into tbl select i%3, i%3, i%3 from generate_series(1,1000)i;
analyze tbl;
explain (verbose, costs off)
select lead(t1.a+sum(t2.b)) over (), sum(t2.c) from
tbl t1 join tbl t2 on t1.b = t2.b group by t1.a;
QUERY PLAN
------------------------------------------------------------------------------
WindowAgg
Output: lead((t1.a + (sum(t2.b)))) OVER (?), (sum(t2.c)), t1.a
-> Finalize HashAggregate
Output: t1.a, sum(t2.b), sum(t2.c)
Group Key: t1.a
-> Hash Join
Output: t1.a, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c))
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.tbl t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.b, (PARTIAL sum(t2.b)), (PARTIAL sum(t2.c))
-> Partial HashAggregate
Output: t2.b, PARTIAL sum(t2.b), PARTIAL sum(t2.c)
Group Key: t2.b
-> Seq Scan on public.tbl t2
Output: t2.a, t2.b, t2.c
(17 rows)
It seems to me that both 't1.a' and 'sum(t2.b)' are valid inputs for
LEAD(), even though we have performed partial aggregation.
Am I missing something?
Thanks
Richard
On Fri, Nov 1, 2024 at 2:54 PM Richard Guo <guofenglinux@gmail.com> wrote:
Perhaps we could introduce a GroupPathInfo into the Path structure to
store common information for a grouped path, such as the location of
the partial aggregation (which could be the relids of the relation on
top of which we place the partial aggregation) and the estimated
rowcount for this grouped path, similar to how ParamPathInfo functions
for parameterized paths. Then we should be able to compare the
grouped paths of the same location apples to apples. I haven’t
thought this through in detail yet, though.
After thinking over this again, I think one difference from the
parameterized path case is that, for a parameterized path, the fewer
the required outer rels, the better, as more outer rels imply more
join restrictions. Therefore, the number of required outer rels
serves as a criterion when comparing paths in add_path().
For a grouped path, however, we don't concern ourselves with the
location of the partial aggregation. What matters is whether one
grouped path is preferable to another based on the current merits of
add_path(). Therefore, I think it's acceptable to compare grouped
paths for the same grouped rel, regardless of where the partial
aggregation is placed.
Note that non-grouped and grouped paths will not appear in the same
RelOptInfo. All paths for a grouped rel are grouped paths, meaning
there is a partial aggregation node somewhere in the path tree.
Similarly, all paths for a non-grouped rel are non-grouped paths.
That is to say, it is not possible to compare a grouped path with a
non-grouped path.
Two different grouped paths for the same grouped rel can have very
different rowcount estimates, depending on where the partial
aggregation is placed in the path tree. Therefore, for a grouped
join path, we have to calculate its rowcount estimate using its outer
and inner paths, as what we do in set_joinpath_size(). This is
similar to what we do for parameterized paths: two different
parameterized paths for the same rel can also have very different
rowcount estimates, depending on which outer rels supply the
parameters. So we calculate the rowcount estimates for parameterized
join paths for each different parameterization in
get_parameterized_joinrel_size().
set_joinpath_size() adds a special case into final_cost_nestloop(),
final_cost_mergejoin(), and final_cost_hashjoin(). For non-grouped
paths, it adds an additional check - IS_GROUPED_REL(rel), which is
defined as
#define IS_GROUPED_REL(rel) ((rel)->agg_info != NULL)
I doubt that this additional simple pointer check will cause general
performance regressions.
Yeah, this patch does not get it correct here. Basically the logic is
that for the partial aggregation pushed down to a non-aggregated
relation, we need to consider all columns of that relation involved in
upper join clauses and include them in the grouping keys. Currently,
the patch only checks whether a column is involved in upper join
clauses but does not verify how the column is used. We need to ensure
that the operator used in the join clause is at least compatible with
the grouping operator; otherwise, the grouping operator might
interpret the values as the same while the join operator sees them as
different.
Hmm, I think we can prevent this issue from occurring if we ensure
that "equality implies image equality" for each grouping key used in
partial aggregation. In such cases, if the grouping operator in
partial aggregation treats two values as equal, the join operator in
the upper join clause must also treat them as equal.
On the other hand, it’s possible that the grouping operator treats two
values as different, while the join operator treats them as equal.
This is fine, as the different partial groups will be combined during
the final aggregation.
Attached is the patch rebased on the latest master. It refines the
theoretical justification for the correctness of this transformation
in README and commit message. It also adds the check for image
equality for all grouping keys used in partial aggregation, and fixes
the issue reported by Jian. It does not yet handle the RLS case
though.
Thanks
Richard
Attachments:
v14-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v14-0001-Implement-Eager-Aggregation.patchDownload
From 8d3955e5a3c5bfa5b5de730733562b2c8e1c671b Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v14] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
A plan with eager aggregation looks like:
EXPLAIN (COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i = b.j
GROUP BY a.i;
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
and store it in a dedicated list.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths during this phase.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
does not seem to be very useful and is currently not supported.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
'destiny', which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
Since eager aggregation can generate many grouped relations, we
introduce a RelInfoList structure, which encapsulates both a list and
a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for grouped relations.
Eager aggregation can use significantly more CPU time and memory than
regular planning when the query involves aggregates and many joining
relations. However, in some cases, the resulting plan can be much
better, justifying the additional planning effort. All the same, for
now, turn this feature off by default.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/README | 80 +
src/backend/optimizer/geqo/geqo_eval.c | 98 +-
src/backend/optimizer/path/allpaths.c | 455 +++++-
src/backend/optimizer/path/costsize.c | 95 +-
src/backend/optimizer/path/joinrels.c | 147 ++
src/backend/optimizer/plan/initsplan.c | 258 ++++
src/backend/optimizer/plan/planmain.c | 17 +-
src/backend/optimizer/plan/planner.c | 99 +-
src/backend/optimizer/util/appendinfo.c | 60 +
src/backend/optimizer/util/pathnode.c | 47 +-
src/backend/optimizer/util/relnode.c | 761 +++++++++-
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 148 +-
src/include/optimizer/pathnode.h | 7 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
src/tools/pgindent/typedefs.list | 7 +-
23 files changed, 3646 insertions(+), 158 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index c0810fbd7c..0063f3942d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6089,7 +6089,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..7a6de25f6e 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,83 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially pushes
+aggregation past a join, and finalizes it once all the relations are joined.
+Eager aggregation may reduce the number of input rows to the join and thus
+could result in a better overall plan.
+
+For example:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+If the partial aggregation on table B significantly reduces the number of
+input rows, the join above will be much cheaper, leading to a more efficient
+final plan.
+
+For the partial aggregation that is pushed down to a non-aggregated relation,
+we need to consider all expressions from this relation that are involved in
+upper join clauses and include them in the grouping keys, using compatible
+operators. This is essential to ensure that an aggregated row from the partial
+aggregation matches the other side of the join if and only if each row in the
+partial group does. This ensures that all rows within the same partial group
+share the same 'destiny', which is crucial for maintaining correctness.
+
+One restriction is that we cannot push partial aggregation down to a relation
+that is in the nullable side of an outer join, because the NULL-extended rows
+produced by the outer join would not be available when we perform the partial
+aggregation, while with a non-eager-aggregation plan these rows are available
+for the top-level aggregation. Pushing partial aggregation in this case may
+result in the rows being grouped differently than expected, or produce
+incorrect values from the aggregate functions.
+
+We can also apply eager aggregation to a join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+During the construction of the join tree, we evaluate each base or join
+relation to determine if eager aggregation can be applied. If feasible, we
+create a separate RelOptInfo called a "grouped relation" and generate grouped
+paths by adding sorted and hashed partial aggregation paths on top of the
+non-grouped paths. To limit planning time, we consider only the cheapest
+non-grouped paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation with a
+non-grouped relation. Joining two grouped relations does not seem to be very
+useful and is currently not supported.
+
+If we have generated a grouped relation for the topmost join relation, we need
+to finalize its paths at the end. The final paths will compete in the usual
+way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..cdc9543135 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -39,10 +39,20 @@ typedef struct
int size; /* number of input relations in clump */
} Clump;
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
+
static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
int num_gene, bool force);
static bool desirable_join(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list);
+static void restore_relinfolist(RelInfoList *relinfo_list,
+ RelInfoListInfo *info);
/*
@@ -60,8 +70,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ RelInfoListInfo save_join_rel;
+ RelInfoListInfo save_grouped_rel;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +88,29 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list and
+ * root->grouped_rel_list, which may or may not already contain some
+ * entries. The newly added entries will be recycled by the
+ * MemoryContextDelete below, so we must ensure that each list within the
+ * RelInfoList structures is restored to its former state before exiting.
+ * We can do this by truncating each list to its original length. NOTE
+ * this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables within the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels or
+ * grouped rels, which we very likely are, new hash tables will get built
+ * and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ save_join_rel = save_relinfolist(root->join_rel_list);
+ save_grouped_rel = save_relinfolist(root->grouped_rel_list);
+
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +132,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list and grouped_rel_list to its
+ * former state, and put back original hashtables if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ restore_relinfolist(root->join_rel_list, &save_join_rel);
+ restore_relinfolist(root->grouped_rel_list, &save_grouped_rel);
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +292,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
@@ -336,3 +370,27 @@ desirable_join(PlannerInfo *root,
/* Otherwise postpone the join till later. */
return false;
}
+
+/*
+ * Save the original length and hashtable of a RelInfoList.
+ */
+static RelInfoListInfo
+save_relinfolist(RelInfoList *relinfo_list)
+{
+ RelInfoListInfo info;
+
+ info.savelength = list_length(relinfo_list->items);
+ info.savehash = relinfo_list->hash;
+
+ return info;
+}
+
+/*
+ * Restore the original length and hashtable of a RelInfoList.
+ */
+static void
+restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info)
+{
+ relinfo_list->items = list_truncate(relinfo_list->items, info->savelength);
+ relinfo_list->hash = info->savehash;
+}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 172edb643a..0ac2c2d507 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,45 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ rel_grouped = build_simple_grouped_rel(root, rel);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +608,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1298,6 +1356,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3306,6 +3394,318 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ /*
+ * If the grouped paths for the given relation are not considered useful,
+ * do not bother to generate them.
+ */
+ if (!agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3414,9 +3814,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
@@ -3465,6 +3866,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3485,6 +3890,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4353,6 +4779,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c36687aa4d..c093b47af4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -180,6 +180,8 @@ static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info,
QualCost *qpqual_cost);
+static void set_joinpath_size(PlannerInfo *root, JoinPath *jpath,
+ SpecialJoinInfo *sjinfo);
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
@@ -3370,19 +3372,7 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
if (inner_path_rows <= 0)
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/* cost of inner-relation source data (we already dealt with outer rel) */
@@ -3867,19 +3857,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/*
* Compute cost of the mergequals and qpquals (other restriction clauses)
@@ -4299,19 +4277,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
path->jpath.path.disabled_nodes = workspace->disabled_nodes;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/* mark the path with estimated # of batches */
path->num_batches = numbatches;
@@ -5061,6 +5027,57 @@ get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
*qpqual_cost = baserel->baserestrictcost;
}
+/*
+ * set_joinpath_size
+ * Set the correct row estimate for the given join path.
+ *
+ * 'jpath' is the join path under consideration.
+ * 'sjinfo' is any SpecialJoinInfo relevant to this join.
+ *
+ * Note that for a grouped join relation, its paths could have very different
+ * rowcount estimates, so we need to calculate the rowcount estimate using the
+ * outer path and inner path of the given join path.
+ */
+static void
+set_joinpath_size(PlannerInfo *root, JoinPath *jpath, SpecialJoinInfo *sjinfo)
+{
+ if (IS_GROUPED_REL(jpath->path.parent))
+ {
+ Path *outer_path = jpath->outerjoinpath;
+ Path *inner_path = jpath->innerjoinpath;
+
+ /*
+ * Estimate the number of rows of this grouped join path as the sizes
+ * of the outer and inner paths times the selectivity of the clauses
+ * that have ended up at this join node.
+ */
+ jpath->path.rows = calc_joinrel_size_estimate(root,
+ jpath->path.parent,
+ outer_path->parent,
+ inner_path->parent,
+ outer_path->rows,
+ inner_path->rows,
+ sjinfo,
+ jpath->joinrestrictinfo);
+ }
+ else
+ {
+ if (jpath->path.param_info)
+ jpath->path.rows = jpath->path.param_info->ppi_rows;
+ else
+ jpath->path.rows = jpath->path.parent->rows;
+
+ /* For partial paths, scale row estimate. */
+ if (jpath->path.parallel_workers > 0)
+ {
+ double parallel_divisor = get_parallel_divisor(&jpath->path);
+
+ jpath->path.rows =
+ clamp_row_est(jpath->path.rows / parallel_divisor);
+ }
+ }
+}
+
/*
* compute_semi_anti_join_factors
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7db5e30eef..20698e48f0 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -35,6 +35,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +774,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +889,141 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+ bool yet_to_add = false;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ /*
+ * If the grouped paths for the given join relation are considered
+ * useful, add the grouped relation we just built to the PlannerInfo
+ * to make it available for further joining or for acting as the upper
+ * rel representing the result of partial aggregation. Otherwise, we
+ * need to postpone the decision on adding the grouped relation to the
+ * PlannerInfo, as it depends on whether we can generate any grouped
+ * paths by joining the given pair of input relations.
+ */
+ if (agg_info->agg_useful)
+ add_grouped_rel(root, rel_grouped);
+ else
+ yet_to_add = true;
+ }
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids);
+ rel2_grouped = find_grouped_rel(root, rel2->relids);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /*
+ * Joining two grouped relations is currently not supported. Grouping one
+ * side would alter the occurrence of the other side's aggregate transient
+ * states in the final aggregation input. While this issue could be
+ * addressed by adjusting the transient states, it is not deemed
+ * worthwhile for now.
+ */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* Generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+
+ /*
+ * Since we have generated grouped paths by joining the given pair of
+ * input relations, add the grouped relation to the PlannerInfo if we have
+ * not already done so.
+ */
+ if (yet_to_add)
+ add_grouped_rel(root, rel_grouped);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1674,6 +1816,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 5f3908be51..1f5e670dcc 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +631,261 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ list_free_deep(root->agg_clause_list);
+ root->agg_clause_list = NIL;
+
+ list_free(root->tlist_vars);
+ root->tlist_vars = NIL;
+
+ return;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ root->tlist_vars = list_append_unique(root->tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 735560e8ca..22df968629 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,12 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list = makeNode(RelInfoList);
+ root->grouped_rel_list->items = NIL;
+ root->grouped_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -76,6 +80,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -260,6 +267,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a0a2de7ee4..049bb679f0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -229,7 +229,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3916,9 +3915,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4000,23 +3997,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6906,16 +6896,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7028,7 +7044,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7046,7 +7062,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7054,7 +7070,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7096,19 +7112,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7158,6 +7172,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7181,19 +7210,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7202,6 +7239,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 45e8b74f94..0e4c7b2b2d 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fc97bf6ee2..673e181b32 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -262,6 +262,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
* unparameterized path, too, if there is one; the users of that list find
* it more convenient if that's included.
*
+ * cheapest_parameterized_paths also always includes the fewest-row
+ * unparameterized path, if there is one, for grouped relations. Different
+ * paths of a grouped relation can have very different row counts, and in some
+ * cases the cheapest-total unparameterized path may not be the one with the
+ * fewest row.
+ *
* This is normally called only after we've finished constructing the path
* list for the rel node.
*/
@@ -271,6 +277,7 @@ set_cheapest(RelOptInfo *parent_rel)
Path *cheapest_startup_path;
Path *cheapest_total_path;
Path *best_param_path;
+ Path *fewest_row_path;
List *parameterized_paths;
ListCell *p;
@@ -280,6 +287,7 @@ set_cheapest(RelOptInfo *parent_rel)
elog(ERROR, "could not devise a query plan for the given query");
cheapest_startup_path = cheapest_total_path = best_param_path = NULL;
+ fewest_row_path = NULL;
parameterized_paths = NIL;
foreach(p, parent_rel->pathlist)
@@ -341,6 +349,8 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path == NULL)
{
cheapest_startup_path = cheapest_total_path = path;
+ if (IS_GROUPED_REL(parent_rel))
+ fewest_row_path = path;
continue;
}
@@ -364,6 +374,27 @@ set_cheapest(RelOptInfo *parent_rel)
compare_pathkeys(cheapest_total_path->pathkeys,
path->pathkeys) == PATHKEYS_BETTER2))
cheapest_total_path = path;
+
+ /*
+ * Find the fewest-row unparameterized path for a grouped
+ * relation. If we find two paths of the same row count, try to
+ * keep the one with the cheaper total cost; if the costs are
+ * identical, keep the better-sorted one.
+ */
+ if (IS_GROUPED_REL(parent_rel))
+ {
+ if (fewest_row_path->rows > path->rows)
+ fewest_row_path = path;
+ else if (fewest_row_path->rows == path->rows)
+ {
+ cmp = compare_path_costs(fewest_row_path, path, TOTAL_COST);
+ if (cmp > 0 ||
+ (cmp == 0 &&
+ compare_pathkeys(fewest_row_path->pathkeys,
+ path->pathkeys) == PATHKEYS_BETTER2))
+ fewest_row_path = path;
+ }
+ }
}
}
@@ -371,6 +402,10 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path)
parameterized_paths = lcons(cheapest_total_path, parameterized_paths);
+ /* Add fewest-row unparameterized path, if any, to parameterized_paths */
+ if (fewest_row_path && fewest_row_path != cheapest_total_path)
+ parameterized_paths = lcons(fewest_row_path, parameterized_paths);
+
/*
* If there is no unparameterized path, use the best parameterized path as
* cheapest_total_path (but not as cheapest_startup_path).
@@ -2787,8 +2822,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3043,8 +3077,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3091,8 +3124,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3253,8 +3285,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f96573eb5d..6282c10da6 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,19 +29,27 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelHashEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelHashEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -83,7 +93,17 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
-
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+
+/* Minimum row reduction ratio at which a grouped path is considered useful */
+#define EAGER_AGGREGATE_RATIO 0.5
/*
* setup_simple_rel_arrays
@@ -276,6 +296,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +427,99 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If the grouped paths for the given base relation are not considered
+ * useful, do not build the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -479,11 +593,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +605,46 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelHashEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing RelOptInfos */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find a RelOptInfo entry corresponding to 'relids'.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +654,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +682,28 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * find_grouped_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for grouped relations.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->grouped_rel_list, relids);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -619,31 +754,53 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
+ * add_rel_info
+ * Add given relation to the list, and also add it to the auxiliary
+ * hashtable if there is one.
*/
static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
/* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
+ if (list->hash)
{
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = joinrel;
+ hentry->rel = rel;
}
}
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
+/*
+ * add_grouped_rel
+ * Add given grouped relation to the list of grouped relations in the
+ * given PlannerInfo.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ add_rel_info(root->grouped_rel_list, rel);
+}
+
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -755,6 +912,7 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1097,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2677,511 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) rel_grouped->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the row reduction ratio is greater than EAGER_AGGREGATE_RATIO.
+ */
+ agg_info->agg_useful =
+ (agg_info->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * row reduction ratio is greater than EAGER_AGGREGATE_RATIO.
+ */
+ result->agg_useful =
+ (result->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current
+ * one.
+ *
+ * If the aggregate needs the current rel plus anything else, grouping
+ * the current rel could make some input variables unavailable for the
+ * higher aggregate and also reduce the number of input rows it
+ * receives.
+ *
+ * If the aggregate does not need the current rel at all, then the
+ * current rel should not be grouped, as we do not support joining two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression can be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final output. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping key in
+ * grouped paths for base or join relations, or 0 otherwise.
+ *
+ * We first check if 'expr' is among the grouping expressions. If it is not,
+ * we then check if 'expr' is known equal to any of the grouping expressions
+ * due to equivalence relationships.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as a grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8cf1afbad2..95bd80c4dd 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a2ac7575ca..154fc5b1fa 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -416,6 +416,7 @@
#enable_tidscan = on
#enable_group_by_reordering = on
#enable_distinct_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0759e00e96..6a0572d9c7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * A structure consisting of a list and a hash table to store relations.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,16 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
+
+ /*
+ * grouped_rel_list is a list of all grouped-relation RelOptInfos we have
+ * considered in this planning run. This is only used by eager
+ * aggregation.
+ */
+ RelInfoList *grouped_rel_list; /* list of grouped-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -373,6 +393,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -998,6 +1027,12 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1071,6 +1106,68 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3145,6 +3242,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 1035e6560c..d3c05a61ba 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,10 +314,16 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -353,4 +359,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54869d4401..a189b7f18c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0b6f0f7969..49614dbd75 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -75,6 +75,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..9f63472eff
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1308 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Incremental Sort
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Sort Key: t2_3.x, t3_3.y
+ Presorted Key: t2_3.x
+ -> Merge Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Merge Cond: (t2_3.x = t3_3.x)
+ -> Sort
+ Output: t2_3.y, t2_3.x
+ Sort Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Sort
+ Output: t3_3.y, t3_3.x
+ Sort Key: t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t2_5.x, t3_5.y, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..6370504377 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -171,7 +172,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1edd9e45eb..4fc210e2ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..ddd669b467 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1064,6 +1065,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -1296,7 +1298,6 @@ Join
JoinCostWorkspace
JoinDomain
JoinExpr
-JoinHashEntry
JoinPath
JoinPathExtraData
JoinState
@@ -2379,13 +2380,17 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
+RelHashEntry
RelIdCacheEnt
RelIdToTypeIdCacheEntry
RelInfo
RelInfoArr
+RelInfoList
+RelInfoListInfo
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
On Tue, Dec 17, 2024 at 12:42 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is the patch rebased on the latest master. It refines the
theoretical justification for the correctness of this transformation
in README and commit message. It also adds the check for image
equality for all grouping keys used in partial aggregation, and fixes
the issue reported by Jian. It does not yet handle the RLS case
though.
I've looked at the RLS case. AFAIU we want to prevent any
non-leakproof aggregation functions from being pushed down past
securityQuals. I added a check in create_agg_clause_infos to ensure
that no aggregation is pushed down if securityQuals are present along
with any non-leakproof aggregate functions. I know this might be
overly strict, but for now, I want to focus on the eager aggregation
transformation itself. We can relax this restriction in subsequent
patches after this already large one.
Attached is the latest patch, which also includes some cosmetic
tweaks. I am seeking the possibility of pushing this by the end of
January, so that I can have enough time to react to any bugs before
the feature freeze.
Thanks
Richard
Attachments:
v15-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v15-0001-Implement-Eager-Aggregation.patchDownload
From 12f11079c46ee5d7ec9a285bb0d667fd461703ed Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v15] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
A plan with eager aggregation looks like:
EXPLAIN (COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i = b.j
GROUP BY a.i;
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
and store it in a dedicated list.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths during this phase.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
does not seem to be very useful and is currently not supported.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
'destiny', which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
Since eager aggregation can generate many grouped relations, we
introduce a RelInfoList structure, which encapsulates both a list and
a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for grouped relations.
Eager aggregation can use significantly more CPU time and memory than
regular planning when the query involves aggregates and many joining
relations. However, in some cases, the resulting plan can be much
better, justifying the additional planning effort. All the same, for
now, turn this feature off by default.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
doc/src/sgml/config.sgml | 15 +
src/backend/optimizer/README | 80 +
src/backend/optimizer/geqo/geqo_eval.c | 98 +-
src/backend/optimizer/path/allpaths.c | 455 +++++-
src/backend/optimizer/path/costsize.c | 95 +-
src/backend/optimizer/path/joinrels.c | 141 ++
src/backend/optimizer/plan/initsplan.c | 273 ++++
src/backend/optimizer/plan/planmain.c | 17 +-
src/backend/optimizer/plan/planner.c | 99 +-
src/backend/optimizer/util/appendinfo.c | 60 +
src/backend/optimizer/util/pathnode.c | 47 +-
src/backend/optimizer/util/relnode.c | 758 +++++++++-
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 148 +-
src/include/optimizer/pathnode.h | 7 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
src/tools/pgindent/typedefs.list | 7 +-
24 files changed, 3667 insertions(+), 158 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index cf56434118..7bb36a52d1 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6089,7 +6089,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fbdd6ce574..3d78a5875f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5382,6 +5382,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index f341d9f303..45236ca46b 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,83 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially pushes
+aggregation past a join, and finalizes it once all the relations are joined.
+Eager aggregation may reduce the number of input rows to the join and thus
+could result in a better overall plan.
+
+For example:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+If the partial aggregation on table B significantly reduces the number of
+input rows, the join above will be much cheaper, leading to a more efficient
+final plan.
+
+For the partial aggregation that is pushed down to a non-aggregated relation,
+we need to consider all expressions from this relation that are involved in
+upper join clauses and include them in the grouping keys, using compatible
+operators. This is essential to ensure that an aggregated row from the partial
+aggregation matches the other side of the join if and only if each row in the
+partial group does. This ensures that all rows within the same partial group
+share the same 'destiny', which is crucial for maintaining correctness.
+
+One restriction is that we cannot push partial aggregation down to a relation
+that is in the nullable side of an outer join, because the NULL-extended rows
+produced by the outer join would not be available when we perform the partial
+aggregation, while with a non-eager-aggregation plan these rows are available
+for the top-level aggregation. Pushing partial aggregation in this case may
+result in the rows being grouped differently than expected, or produce
+incorrect values from the aggregate functions.
+
+We can also apply eager aggregation to a join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+During the construction of the join tree, we evaluate each base or join
+relation to determine if eager aggregation can be applied. If feasible, we
+create a separate RelOptInfo called a "grouped relation" and generate grouped
+paths by adding sorted and hashed partial aggregation paths on top of the
+non-grouped paths. To limit planning time, we consider only the cheapest or
+suitably-sorted non-grouped paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation with a
+non-grouped relation. Joining two grouped relations does not seem to be very
+useful and is currently not supported.
+
+If we have generated a grouped relation for the topmost join relation, we need
+to finalize its paths at the end. The final paths will compete in the usual
+way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..cdc9543135 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -39,10 +39,20 @@ typedef struct
int size; /* number of input relations in clump */
} Clump;
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
+
static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
int num_gene, bool force);
static bool desirable_join(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list);
+static void restore_relinfolist(RelInfoList *relinfo_list,
+ RelInfoListInfo *info);
/*
@@ -60,8 +70,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ RelInfoListInfo save_join_rel;
+ RelInfoListInfo save_grouped_rel;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +88,29 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list and
+ * root->grouped_rel_list, which may or may not already contain some
+ * entries. The newly added entries will be recycled by the
+ * MemoryContextDelete below, so we must ensure that each list within the
+ * RelInfoList structures is restored to its former state before exiting.
+ * We can do this by truncating each list to its original length. NOTE
+ * this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables within the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels or
+ * grouped rels, which we very likely are, new hash tables will get built
+ * and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ save_join_rel = save_relinfolist(root->join_rel_list);
+ save_grouped_rel = save_relinfolist(root->grouped_rel_list);
+
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +132,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list and grouped_rel_list to its
+ * former state, and put back original hashtables if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ restore_relinfolist(root->join_rel_list, &save_join_rel);
+ restore_relinfolist(root->grouped_rel_list, &save_grouped_rel);
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +292,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
@@ -336,3 +370,27 @@ desirable_join(PlannerInfo *root,
/* Otherwise postpone the join till later. */
return false;
}
+
+/*
+ * Save the original length and hashtable of a RelInfoList.
+ */
+static RelInfoListInfo
+save_relinfolist(RelInfoList *relinfo_list)
+{
+ RelInfoListInfo info;
+
+ info.savelength = list_length(relinfo_list->items);
+ info.savehash = relinfo_list->hash;
+
+ return info;
+}
+
+/*
+ * Restore the original length and hashtable of a RelInfoList.
+ */
+static void
+restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info)
+{
+ relinfo_list->items = list_truncate(relinfo_list->items, info->savelength);
+ relinfo_list->hash = info->savehash;
+}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 172edb643a..13228377a5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,45 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ rel_grouped = build_simple_grouped_rel(root, rel);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +608,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1298,6 +1356,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3306,6 +3394,318 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ /*
+ * If the grouped paths for the given relation are not considered useful,
+ * do not bother to generate them.
+ */
+ if (!agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3414,9 +3814,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
@@ -3465,6 +3866,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3485,6 +3890,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4353,6 +4779,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c36687aa4d..c093b47af4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -180,6 +180,8 @@ static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info,
QualCost *qpqual_cost);
+static void set_joinpath_size(PlannerInfo *root, JoinPath *jpath,
+ SpecialJoinInfo *sjinfo);
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
@@ -3370,19 +3372,7 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
if (inner_path_rows <= 0)
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/* cost of inner-relation source data (we already dealt with outer rel) */
@@ -3867,19 +3857,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/*
* Compute cost of the mergequals and qpquals (other restriction clauses)
@@ -4299,19 +4277,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
path->jpath.path.disabled_nodes = workspace->disabled_nodes;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/* mark the path with estimated # of batches */
path->num_batches = numbatches;
@@ -5061,6 +5027,57 @@ get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
*qpqual_cost = baserel->baserestrictcost;
}
+/*
+ * set_joinpath_size
+ * Set the correct row estimate for the given join path.
+ *
+ * 'jpath' is the join path under consideration.
+ * 'sjinfo' is any SpecialJoinInfo relevant to this join.
+ *
+ * Note that for a grouped join relation, its paths could have very different
+ * rowcount estimates, so we need to calculate the rowcount estimate using the
+ * outer path and inner path of the given join path.
+ */
+static void
+set_joinpath_size(PlannerInfo *root, JoinPath *jpath, SpecialJoinInfo *sjinfo)
+{
+ if (IS_GROUPED_REL(jpath->path.parent))
+ {
+ Path *outer_path = jpath->outerjoinpath;
+ Path *inner_path = jpath->innerjoinpath;
+
+ /*
+ * Estimate the number of rows of this grouped join path as the sizes
+ * of the outer and inner paths times the selectivity of the clauses
+ * that have ended up at this join node.
+ */
+ jpath->path.rows = calc_joinrel_size_estimate(root,
+ jpath->path.parent,
+ outer_path->parent,
+ inner_path->parent,
+ outer_path->rows,
+ inner_path->rows,
+ sjinfo,
+ jpath->joinrestrictinfo);
+ }
+ else
+ {
+ if (jpath->path.param_info)
+ jpath->path.rows = jpath->path.param_info->ppi_rows;
+ else
+ jpath->path.rows = jpath->path.parent->rows;
+
+ /* For partial paths, scale row estimate. */
+ if (jpath->path.parallel_workers > 0)
+ {
+ double parallel_divisor = get_parallel_divisor(&jpath->path);
+
+ jpath->path.rows =
+ clamp_row_est(jpath->path.rows / parallel_divisor);
+ }
+ }
+}
+
/*
* compute_semi_anti_join_factors
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 7db5e30eef..248aa3fffe 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -35,6 +35,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +774,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +889,135 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+ bool yet_to_add = false;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ /*
+ * If the grouped paths for the given join relation are considered
+ * useful, add the grouped relation we just built to the PlannerInfo
+ * to make it available for further joining or for acting as the upper
+ * rel representing the result of partial aggregation. Otherwise, we
+ * need to postpone the decision on adding the grouped relation to the
+ * PlannerInfo, as it depends on whether we can generate any grouped
+ * paths by joining the given pair of input relations.
+ */
+ if (agg_info->agg_useful)
+ add_grouped_rel(root, rel_grouped);
+ else
+ yet_to_add = true;
+ }
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids);
+ rel2_grouped = find_grouped_rel(root, rel2->relids);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /* Joining two grouped relations is currently not supported */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* Generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+
+ /*
+ * Since we have generated grouped paths by joining the given pair of
+ * input relations, add the grouped relation to the PlannerInfo if we have
+ * not already done so.
+ */
+ if (yet_to_add)
+ add_grouped_rel(root, rel_grouped);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1674,6 +1810,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 5f3908be51..051276a73e 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +631,276 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+
+ return;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+
+ return;
+ }
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 735560e8ca..22df968629 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,12 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list = makeNode(RelInfoList);
+ root->grouped_rel_list->items = NIL;
+ root->grouped_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -76,6 +80,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -260,6 +267,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7468961b01..99e46cc152 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -229,7 +229,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3915,9 +3914,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -3999,23 +3996,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6906,16 +6896,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7028,7 +7044,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7046,7 +7062,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7054,7 +7070,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7096,19 +7112,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7158,6 +7172,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7181,19 +7210,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7202,6 +7239,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 45e8b74f94..0e4c7b2b2d 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 4f74cafa25..85e419160b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -262,6 +262,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
* unparameterized path, too, if there is one; the users of that list find
* it more convenient if that's included.
*
+ * cheapest_parameterized_paths also always includes the fewest-row
+ * unparameterized path, if there is one, for grouped relations. Different
+ * paths of a grouped relation can have very different row counts, and in some
+ * cases the cheapest-total unparameterized path may not be the one with the
+ * fewest row.
+ *
* This is normally called only after we've finished constructing the path
* list for the rel node.
*/
@@ -271,6 +277,7 @@ set_cheapest(RelOptInfo *parent_rel)
Path *cheapest_startup_path;
Path *cheapest_total_path;
Path *best_param_path;
+ Path *fewest_row_path;
List *parameterized_paths;
ListCell *p;
@@ -280,6 +287,7 @@ set_cheapest(RelOptInfo *parent_rel)
elog(ERROR, "could not devise a query plan for the given query");
cheapest_startup_path = cheapest_total_path = best_param_path = NULL;
+ fewest_row_path = NULL;
parameterized_paths = NIL;
foreach(p, parent_rel->pathlist)
@@ -341,6 +349,8 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path == NULL)
{
cheapest_startup_path = cheapest_total_path = path;
+ if (IS_GROUPED_REL(parent_rel))
+ fewest_row_path = path;
continue;
}
@@ -364,6 +374,27 @@ set_cheapest(RelOptInfo *parent_rel)
compare_pathkeys(cheapest_total_path->pathkeys,
path->pathkeys) == PATHKEYS_BETTER2))
cheapest_total_path = path;
+
+ /*
+ * Find the fewest-row unparameterized path for a grouped
+ * relation. If we find two paths of the same row count, try to
+ * keep the one with the cheaper total cost; if the costs are
+ * identical, keep the better-sorted one.
+ */
+ if (IS_GROUPED_REL(parent_rel))
+ {
+ if (fewest_row_path->rows > path->rows)
+ fewest_row_path = path;
+ else if (fewest_row_path->rows == path->rows)
+ {
+ cmp = compare_path_costs(fewest_row_path, path, TOTAL_COST);
+ if (cmp > 0 ||
+ (cmp == 0 &&
+ compare_pathkeys(fewest_row_path->pathkeys,
+ path->pathkeys) == PATHKEYS_BETTER2))
+ fewest_row_path = path;
+ }
+ }
}
}
@@ -371,6 +402,10 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path)
parameterized_paths = lcons(cheapest_total_path, parameterized_paths);
+ /* Add fewest-row unparameterized path, if any, to parameterized_paths */
+ if (fewest_row_path && fewest_row_path != cheapest_total_path)
+ parameterized_paths = lcons(fewest_row_path, parameterized_paths);
+
/*
* If there is no unparameterized path, use the best parameterized path as
* cheapest_total_path (but not as cheapest_startup_path).
@@ -2787,8 +2822,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3043,8 +3077,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3091,8 +3124,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3253,8 +3285,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index f96573eb5d..d349ae521b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,19 +29,27 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelHashEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelHashEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -83,7 +93,17 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
-
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+
+/* Minimum row reduction ratio at which a grouped path is considered useful */
+#define EAGER_AGGREGATE_RATIO 0.5
/*
* setup_simple_rel_arrays
@@ -276,6 +296,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +427,99 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If the grouped paths for the given base relation are not considered
+ * useful, do not build the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -479,11 +593,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +605,46 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelHashEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing RelOptInfos */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find a RelOptInfo entry corresponding to 'relids'.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +654,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +682,28 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * find_grouped_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for grouped relations.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->grouped_rel_list, relids);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -619,31 +754,53 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
+ * add_rel_info
+ * Add given relation to the list, and also add it to the auxiliary
+ * hashtable if there is one.
*/
static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
/* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
+ if (list->hash)
{
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = joinrel;
+ hentry->rel = rel;
}
}
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
+/*
+ * add_grouped_rel
+ * Add given grouped relation to the list of grouped relations in the
+ * given PlannerInfo.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ add_rel_info(root->grouped_rel_list, rel);
+}
+
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -755,6 +912,7 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1097,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2677,508 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) rel_grouped->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the row reduction ratio is greater than EAGER_AGGREGATE_RATIO.
+ */
+ agg_info->agg_useful =
+ (agg_info->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * row reduction ratio is greater than EAGER_AGGREGATE_RATIO.
+ */
+ result->agg_useful =
+ (result->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current
+ * one.
+ *
+ * If the aggregate needs the current rel plus anything else, grouping
+ * the current rel could make some input variables unavailable for the
+ * higher aggregate and also reduce the number of input rows it
+ * receives.
+ *
+ * If the aggregate does not need the current rel at all, then the
+ * current rel should not be grouped, as we do not support joining two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression can be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final output. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping key in
+ * grouped paths for base or join relations, or 0 otherwise.
+ *
+ * We first check if 'expr' is among the grouping expressions. If it is not,
+ * we then check if 'expr' is known equal to any of the grouping expressions
+ * due to equivalence relationships.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as a grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8cf1afbad2..95bd80c4dd 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a2ac7575ca..154fc5b1fa 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -416,6 +416,7 @@
#enable_tidscan = on
#enable_group_by_reordering = on
#enable_distinct_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 58748d2ca6..0b2c51f73e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * A structure consisting of a list and a hash table to store relations.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,16 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
+
+ /*
+ * grouped_rel_list is a list of all grouped-relation RelOptInfos we have
+ * considered in this planning run. This is only used by eager
+ * aggregation.
+ */
+ RelInfoList *grouped_rel_list; /* list of grouped-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -373,6 +393,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -998,6 +1027,12 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1071,6 +1106,68 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3144,6 +3241,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 5a6d0350c1..8dde37cbff 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -313,10 +313,16 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -352,4 +358,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 54869d4401..a189b7f18c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 0b6f0f7969..49614dbd75 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -75,6 +75,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..9f63472eff
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1308 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Incremental Sort
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Sort Key: t2_3.x, t3_3.y
+ Presorted Key: t2_3.x
+ -> Merge Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Merge Cond: (t2_3.x = t3_3.x)
+ -> Sort
+ Output: t2_3.y, t2_3.x
+ Sort Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Sort
+ Output: t3_3.y, t3_3.x
+ Sort Key: t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t2_5.x, t3_5.y, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..6370504377 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -171,7 +172,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1edd9e45eb..4fc210e2ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e1c4f913f8..95be701ec3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1065,6 +1066,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -1297,7 +1299,6 @@ Join
JoinCostWorkspace
JoinDomain
JoinExpr
-JoinHashEntry
JoinPath
JoinPathExtraData
JoinState
@@ -2383,13 +2384,17 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
+RelHashEntry
RelIdCacheEnt
RelIdToTypeIdCacheEntry
RelInfo
RelInfoArr
+RelInfoList
+RelInfoListInfo
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
hi.
in create_grouping_expr_infos
tce = lookup_type_cache(exprType((Node *) tle->expr),
TYPECACHE_BTREE_OPFAMILY);
if (!OidIsValid(tce->btree_opf) ||
!OidIsValid(tce->btree_opintype))
return;
....
/*
* Get the operator in the btree's opfamily.
*/
eq_op = get_opfamily_member(tce->btree_opf,
tce->btree_opintype,
tce->btree_opintype,
BTEqualStrategyNumber);
if (!OidIsValid(eq_op))
return;
eq_opfamilies = get_mergejoin_opfamilies(eq_op);
if (!eq_opfamilies)
return;
btree_opfamily = linitial_oid(eq_opfamilies);
If eq_op is valid, then we don't need to call get_mergejoin_opfamilies?
since get_mergejoin_opfamilies output will be the same as tce->btree_opf.
and we already checked (tce->btree_opf) is valid.
In other words, I think eq_op is valid imply
that tce->btree_opf is the value (btree opfamily) we need.
On Thu, Jan 9, 2025 at 12:15 PM jian he <jian.universality@gmail.com> wrote:
hi.
in create_grouping_expr_infostce = lookup_type_cache(exprType((Node *) tle->expr),
TYPECACHE_BTREE_OPFAMILY);
if (!OidIsValid(tce->btree_opf) ||
!OidIsValid(tce->btree_opintype))
return;
....
/*
* Get the operator in the btree's opfamily.
*/
eq_op = get_opfamily_member(tce->btree_opf,
tce->btree_opintype,
tce->btree_opintype,
BTEqualStrategyNumber);
if (!OidIsValid(eq_op))
return;
eq_opfamilies = get_mergejoin_opfamilies(eq_op);
if (!eq_opfamilies)
return;
btree_opfamily = linitial_oid(eq_opfamilies);If eq_op is valid, then we don't need to call get_mergejoin_opfamilies?
since get_mergejoin_opfamilies output will be the same as tce->btree_opf.
and we already checked (tce->btree_opf) is valid.In other words, I think eq_op is valid imply
that tce->btree_opf is the value (btree opfamily) we need.
Nice catch! Actually, we can use tce->btree_opf directly, without
needing to check its equality operator, since we know it's a btree
opfamily and it's valid. If it were a different opfamily (such as a
hash opfamily), we would need to look up its equality operator, and
select some btree opfamily that that operator is part of. But in this
case, that's not necessary.
Thanks
Richard
On Sat, Dec 21, 2024 at 10:05 AM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is the latest patch, which also includes some cosmetic
tweaks. I am seeking the possibility of pushing this by the end of
January, so that I can have enough time to react to any bugs before
the feature freeze.
Attached is an updated version of this patch that addresses Jian's
review comments, along with some more cosmetic tweaks. I'm going to
be looking at this patch again from the point of view of committing
it, with the plan to commit it late this week or early next week,
barring any further comments or objections.
Thanks
Richard
Attachments:
v16-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v16-0001-Implement-Eager-Aggregation.patchDownload
From 939ad5d47e6fdbc260fdf41b64ffe2bdd3e4ad2c Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v16] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
A plan with eager aggregation looks like:
EXPLAIN (COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i = b.j
GROUP BY a.i;
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
and store it in a dedicated list.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths during this phase.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
does not seem to be very useful and is currently not supported.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
'destiny', which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
Since eager aggregation can generate many grouped relations, we
introduce a RelInfoList structure, which encapsulates both a list and
a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for grouped relations.
Eager aggregation can use significantly more CPU time and memory than
regular planning when the query involves aggregates and many joining
relations. However, in some cases, the resulting plan can be much
better, justifying the additional planning effort. All the same, for
now, turn this feature off by default.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
doc/src/sgml/config.sgml | 15 +
src/backend/optimizer/README | 80 +
src/backend/optimizer/geqo/geqo_eval.c | 98 +-
src/backend/optimizer/path/allpaths.c | 455 +++++-
src/backend/optimizer/path/costsize.c | 95 +-
src/backend/optimizer/path/joinrels.c | 141 ++
src/backend/optimizer/plan/initsplan.c | 258 ++++
src/backend/optimizer/plan/planmain.c | 17 +-
src/backend/optimizer/plan/planner.c | 99 +-
src/backend/optimizer/util/appendinfo.c | 60 +
src/backend/optimizer/util/pathnode.c | 47 +-
src/backend/optimizer/util/relnode.c | 754 +++++++++-
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 157 +-
src/include/optimizer/pathnode.h | 7 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 192 +++
src/tools/pgindent/typedefs.list | 7 +-
24 files changed, 3655 insertions(+), 160 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b92e2a0fc9..76f88bd3e3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6089,7 +6089,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3f41a17b1f..09a3c4caf2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5241,6 +5241,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>off</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index f341d9f303..45236ca46b 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,83 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially pushes
+aggregation past a join, and finalizes it once all the relations are joined.
+Eager aggregation may reduce the number of input rows to the join and thus
+could result in a better overall plan.
+
+For example:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+If the partial aggregation on table B significantly reduces the number of
+input rows, the join above will be much cheaper, leading to a more efficient
+final plan.
+
+For the partial aggregation that is pushed down to a non-aggregated relation,
+we need to consider all expressions from this relation that are involved in
+upper join clauses and include them in the grouping keys, using compatible
+operators. This is essential to ensure that an aggregated row from the partial
+aggregation matches the other side of the join if and only if each row in the
+partial group does. This ensures that all rows within the same partial group
+share the same 'destiny', which is crucial for maintaining correctness.
+
+One restriction is that we cannot push partial aggregation down to a relation
+that is in the nullable side of an outer join, because the NULL-extended rows
+produced by the outer join would not be available when we perform the partial
+aggregation, while with a non-eager-aggregation plan these rows are available
+for the top-level aggregation. Pushing partial aggregation in this case may
+result in the rows being grouped differently than expected, or produce
+incorrect values from the aggregate functions.
+
+We can also apply eager aggregation to a join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+During the construction of the join tree, we evaluate each base or join
+relation to determine if eager aggregation can be applied. If feasible, we
+create a separate RelOptInfo called a "grouped relation" and generate grouped
+paths by adding sorted and hashed partial aggregation paths on top of the
+non-grouped paths. To limit planning time, we consider only the cheapest or
+suitably-sorted non-grouped paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation with a
+non-grouped relation. Joining two grouped relations does not seem to be very
+useful and is currently not supported.
+
+If we have generated a grouped relation for the topmost join relation, we need
+to finalize its paths at the end. The final paths will compete in the usual
+way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac..e69eac9bff 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -39,10 +39,20 @@ typedef struct
int size; /* number of input relations in clump */
} Clump;
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
+
static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
int num_gene, bool force);
static bool desirable_join(PlannerInfo *root,
RelOptInfo *outer_rel, RelOptInfo *inner_rel);
+static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list);
+static void restore_relinfolist(RelInfoList *relinfo_list,
+ RelInfoListInfo *info);
/*
@@ -60,8 +70,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ RelInfoListInfo save_join_rel;
+ RelInfoListInfo save_grouped_rel;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +88,29 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list and
+ * root->grouped_rel_list, which may or may not already contain some
+ * entries. The newly added entries will be recycled by the
+ * MemoryContextDelete below, so we must ensure that each list within the
+ * RelInfoList structures is restored to its former state before exiting.
+ * We can do this by truncating each list to its original length. NOTE
+ * this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables within the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels or
+ * grouped rels, which we very likely are, new hash tables will get built
+ * and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ save_join_rel = save_relinfolist(root->join_rel_list);
+ save_grouped_rel = save_relinfolist(root->grouped_rel_list);
+
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +132,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list and grouped_rel_list to its
+ * former state, and put back original hashtables if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ restore_relinfolist(root->join_rel_list, &save_join_rel);
+ restore_relinfolist(root->grouped_rel_list, &save_grouped_rel);
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +292,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
@@ -336,3 +370,27 @@ desirable_join(PlannerInfo *root,
/* Otherwise postpone the join till later. */
return false;
}
+
+/*
+ * Save the original length and hashtable of a RelInfoList.
+ */
+static RelInfoListInfo
+save_relinfolist(RelInfoList *relinfo_list)
+{
+ RelInfoListInfo info;
+
+ info.savelength = list_length(relinfo_list->items);
+ info.savehash = relinfo_list->hash;
+
+ return info;
+}
+
+/*
+ * Restore the original length and hashtable of a RelInfoList.
+ */
+static void
+restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info)
+{
+ relinfo_list->items = list_truncate(relinfo_list->items, info->savelength);
+ relinfo_list->hash = info->savehash;
+}
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 3364589391..836c0bcbf5 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,45 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ rel_grouped = build_simple_grouped_rel(root, rel);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +608,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1298,6 +1356,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3306,6 +3394,318 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ /*
+ * If the grouped paths for the given relation are not considered useful,
+ * do not bother to generate them.
+ */
+ if (!agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3414,9 +3814,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
@@ -3465,6 +3866,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3485,6 +3890,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4353,6 +4779,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids);
+ if (rel_grouped)
+ {
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ rel_grouped->agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ec004ed949..78ea6550a4 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -180,6 +180,8 @@ static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context);
static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
ParamPathInfo *param_info,
QualCost *qpqual_cost);
+static void set_joinpath_size(PlannerInfo *root, JoinPath *jpath,
+ SpecialJoinInfo *sjinfo);
static bool has_indexed_join_quals(NestPath *path);
static double approx_tuple_count(PlannerInfo *root, JoinPath *path,
List *quals);
@@ -3370,19 +3372,7 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
if (inner_path_rows <= 0)
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/* cost of inner-relation source data (we already dealt with outer rel) */
@@ -3867,19 +3857,7 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
inner_path_rows = 1;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/*
* Compute cost of the mergequals and qpquals (other restriction clauses)
@@ -4299,19 +4277,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
path->jpath.path.disabled_nodes = workspace->disabled_nodes;
/* Mark the path with the correct row estimate */
- if (path->jpath.path.param_info)
- path->jpath.path.rows = path->jpath.path.param_info->ppi_rows;
- else
- path->jpath.path.rows = path->jpath.path.parent->rows;
-
- /* For partial paths, scale row estimate. */
- if (path->jpath.path.parallel_workers > 0)
- {
- double parallel_divisor = get_parallel_divisor(&path->jpath.path);
-
- path->jpath.path.rows =
- clamp_row_est(path->jpath.path.rows / parallel_divisor);
- }
+ set_joinpath_size(root, &path->jpath, extra->sjinfo);
/* mark the path with estimated # of batches */
path->num_batches = numbatches;
@@ -5061,6 +5027,57 @@ get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel,
*qpqual_cost = baserel->baserestrictcost;
}
+/*
+ * set_joinpath_size
+ * Set the correct row estimate for the given join path.
+ *
+ * 'jpath' is the join path under consideration.
+ * 'sjinfo' is any SpecialJoinInfo relevant to this join.
+ *
+ * Note that for a grouped join relation, its paths could have very different
+ * rowcount estimates, so we need to calculate the rowcount estimate using the
+ * outer path and inner path of the given join path.
+ */
+static void
+set_joinpath_size(PlannerInfo *root, JoinPath *jpath, SpecialJoinInfo *sjinfo)
+{
+ if (IS_GROUPED_REL(jpath->path.parent))
+ {
+ Path *outer_path = jpath->outerjoinpath;
+ Path *inner_path = jpath->innerjoinpath;
+
+ /*
+ * Estimate the number of rows of this grouped join path as the sizes
+ * of the outer and inner paths times the selectivity of the clauses
+ * that have ended up at this join node.
+ */
+ jpath->path.rows = calc_joinrel_size_estimate(root,
+ jpath->path.parent,
+ outer_path->parent,
+ inner_path->parent,
+ outer_path->rows,
+ inner_path->rows,
+ sjinfo,
+ jpath->joinrestrictinfo);
+ }
+ else
+ {
+ if (jpath->path.param_info)
+ jpath->path.rows = jpath->path.param_info->ppi_rows;
+ else
+ jpath->path.rows = jpath->path.parent->rows;
+
+ /* For partial paths, scale row estimate. */
+ if (jpath->path.parallel_workers > 0)
+ {
+ double parallel_divisor = get_parallel_divisor(&jpath->path);
+
+ jpath->path.rows =
+ clamp_row_est(jpath->path.rows / parallel_divisor);
+ }
+ }
+}
+
/*
* compute_semi_anti_join_factors
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index c2eb300ea9..88ab272479 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -35,6 +35,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -771,6 +774,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -882,6 +889,135 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *rel_grouped;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+ bool rel1_empty;
+ bool rel2_empty;
+ bool yet_to_add = false;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * See if we already have a grouped joinrel for this joinrel.
+ */
+ rel_grouped = find_grouped_rel(root, joinrel->relids);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ /*
+ * If the grouped paths for the given join relation are considered
+ * useful, add the grouped relation we just built to the PlannerInfo
+ * to make it available for further joining or for acting as the upper
+ * rel representing the result of partial aggregation. Otherwise, we
+ * need to postpone the decision on adding the grouped relation to the
+ * PlannerInfo, as it depends on whether we can generate any grouped
+ * paths by joining the given pair of input relations.
+ */
+ if (agg_info->agg_useful)
+ add_grouped_rel(root, rel_grouped);
+ else
+ yet_to_add = true;
+ }
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(rel_grouped))
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids);
+ rel2_grouped = find_grouped_rel(root, rel2->relids);
+
+ rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped));
+ rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_empty && rel2_empty)
+ return;
+
+ /* Joining two grouped relations is currently not supported */
+ if (!rel1_empty && !rel2_empty)
+ return;
+
+ /* Generate partial aggregation paths for the grouped relation */
+ if (!rel1_empty)
+ {
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel1_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel1_grouped));
+ }
+ else if (!rel2_empty)
+ {
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+
+ /*
+ * It shouldn't happen that we have marked rel2_grouped as dummy in
+ * populate_joinrel_with_paths due to provably constant-false join
+ * restrictions, hence we wouldn't end up with a plan that has Aggref
+ * in non-Agg plan node.
+ */
+ Assert(!IS_DUMMY_REL(rel2_grouped));
+ }
+
+ /*
+ * Since we have generated grouped paths by joining the given pair of
+ * input relations, add the grouped relation to the PlannerInfo if we have
+ * not already done so.
+ */
+ if (yet_to_add)
+ add_grouped_rel(root, rel_grouped);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1674,6 +1810,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 2cb0ae6d65..0821723754 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +631,261 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in targetlist
+ * and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ list_free(tlist_exprs);
+
+ return;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ list_free(tlist_exprs);
+
+ return;
+ }
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ade23fd9d5..30cec6d9b2 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -64,8 +64,12 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
+ root->grouped_rel_list = makeNode(RelInfoList);
+ root->grouped_rel_list->items = NIL;
+ root->grouped_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
@@ -76,6 +80,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -260,6 +267,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6803edd085..dce50e4837 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -229,7 +229,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3915,9 +3914,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -3999,23 +3996,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -6906,16 +6896,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7028,7 +7044,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7046,7 +7062,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7054,7 +7070,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7096,19 +7112,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7158,6 +7172,21 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
+ /*
+ * It is possible that the partially_grouped_rel created by eager
+ * aggregation is dummy. In this case we just set it to NULL. It might
+ * be created again by the following logic if possible.
+ */
+ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel))
+ partially_grouped_rel = NULL;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7181,19 +7210,27 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ (partially_grouped_rel == NULL ||
+ partially_grouped_rel->pathlist == NIL) &&
!force_rel_creation)
return NULL;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
@@ -7202,6 +7239,14 @@ create_partial_grouping_paths(PlannerInfo *root,
partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent;
partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine;
+ /*
+ * Partially-grouped partial paths may have been generated by eager
+ * aggregation. If we find that parallelism is not possible for
+ * partially_grouped_rel, we need to drop these partial paths.
+ */
+ if (!partially_grouped_rel->consider_parallel)
+ partially_grouped_rel->partial_pathlist = NIL;
+
/*
* Build target list for partial aggregate paths. These paths cannot just
* emit the same tlist as regular aggregate paths, because (1) we must
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index cece3a5be7..20cfe95340 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 93e73cb44d..9d5df0553b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -262,6 +262,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
* unparameterized path, too, if there is one; the users of that list find
* it more convenient if that's included.
*
+ * cheapest_parameterized_paths also always includes the fewest-row
+ * unparameterized path, if there is one, for grouped relations. Different
+ * paths of a grouped relation can have very different row counts, and in some
+ * cases the cheapest-total unparameterized path may not be the one with the
+ * fewest row.
+ *
* This is normally called only after we've finished constructing the path
* list for the rel node.
*/
@@ -271,6 +277,7 @@ set_cheapest(RelOptInfo *parent_rel)
Path *cheapest_startup_path;
Path *cheapest_total_path;
Path *best_param_path;
+ Path *fewest_row_path;
List *parameterized_paths;
ListCell *p;
@@ -280,6 +287,7 @@ set_cheapest(RelOptInfo *parent_rel)
elog(ERROR, "could not devise a query plan for the given query");
cheapest_startup_path = cheapest_total_path = best_param_path = NULL;
+ fewest_row_path = NULL;
parameterized_paths = NIL;
foreach(p, parent_rel->pathlist)
@@ -341,6 +349,8 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path == NULL)
{
cheapest_startup_path = cheapest_total_path = path;
+ if (IS_GROUPED_REL(parent_rel))
+ fewest_row_path = path;
continue;
}
@@ -364,6 +374,27 @@ set_cheapest(RelOptInfo *parent_rel)
compare_pathkeys(cheapest_total_path->pathkeys,
path->pathkeys) == PATHKEYS_BETTER2))
cheapest_total_path = path;
+
+ /*
+ * Find the fewest-row unparameterized path for a grouped
+ * relation. If we find two paths of the same row count, try to
+ * keep the one with the cheaper total cost; if the costs are
+ * identical, keep the better-sorted one.
+ */
+ if (IS_GROUPED_REL(parent_rel))
+ {
+ if (fewest_row_path->rows > path->rows)
+ fewest_row_path = path;
+ else if (fewest_row_path->rows == path->rows)
+ {
+ cmp = compare_path_costs(fewest_row_path, path, TOTAL_COST);
+ if (cmp > 0 ||
+ (cmp == 0 &&
+ compare_pathkeys(fewest_row_path->pathkeys,
+ path->pathkeys) == PATHKEYS_BETTER2))
+ fewest_row_path = path;
+ }
+ }
}
}
@@ -371,6 +402,10 @@ set_cheapest(RelOptInfo *parent_rel)
if (cheapest_total_path)
parameterized_paths = lcons(cheapest_total_path, parameterized_paths);
+ /* Add fewest-row unparameterized path, if any, to parameterized_paths */
+ if (fewest_row_path && fewest_row_path != cheapest_total_path)
+ parameterized_paths = lcons(fewest_row_path, parameterized_paths);
+
/*
* If there is no unparameterized path, use the best parameterized path as
* cheapest_total_path (but not as cheapest_startup_path).
@@ -2787,8 +2822,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3043,8 +3077,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3091,8 +3124,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3253,8 +3285,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a0..0f72110063 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,19 +29,27 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelHashEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelHashEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -83,7 +93,17 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
-
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+
+/* Minimum row reduction ratio at which a grouped path is considered useful */
+#define EAGER_AGGREGATE_RATIO 0.5
/*
* setup_simple_rel_arrays
@@ -276,6 +296,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +427,99 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If the grouped paths for the given base relation are not considered
+ * useful, do not build the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+ rel_grouped->agg_info = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -479,11 +593,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -491,47 +605,46 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelHashEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing RelOptInfos */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find a RelOptInfo entry corresponding to 'relids'.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -541,23 +654,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -569,6 +682,28 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * find_grouped_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for grouped relations.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->grouped_rel_list, relids);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -619,31 +754,53 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
+ * add_rel_info
+ * Add given relation to the list, and also add it to the auxiliary
+ * hashtable if there is one.
*/
static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
/* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
+ if (list->hash)
{
- JoinHashEntry *hentry;
+ RelHashEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelHashEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = joinrel;
+ hentry->rel = rel;
}
}
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
+/*
+ * add_grouped_rel
+ * Add given grouped relation to the list of grouped relations in the
+ * given PlannerInfo.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ add_rel_info(root->grouped_rel_list, rel);
+}
+
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -755,6 +912,7 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1097,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2677,504 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ Assert(!bms_is_empty(rel->top_parent_relids));
+ rel_grouped = find_grouped_rel(root, rel->top_parent_relids);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(rel_grouped));
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) rel_grouped->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the row reduction ratio is no less than EAGER_AGGREGATE_RATIO.
+ */
+ agg_info->agg_useful =
+ (agg_info->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * row reduction ratio is no less than EAGER_AGGREGATE_RATIO.
+ */
+ result->agg_useful =
+ (result->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO));
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression can be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final output. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can act as grouping expression,
+ * or 0 otherwise.
+ *
+ * We first check if 'expr' is among the grouping expressions. If it is not,
+ * we then check if 'expr' is known equal to any of the grouping expressions
+ * due to equivalence relationships.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot act as grouping expression. */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index c9d8cd796a..2286b981c3 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b2bc43383d..e142d37c70 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -416,6 +416,7 @@
#enable_tidscan = on
#enable_group_by_reordering = on
#enable_distinct_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 54ee17697e..44728e5522 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * A structure consisting of a list and a hash table to store relations.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -270,15 +289,16 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
+
+ /*
+ * grouped_rel_list is a list of all grouped-relation RelOptInfos we have
+ * considered in this planning run. This is only used by eager
+ * aggregation.
+ */
+ RelInfoList *grouped_rel_list; /* list of grouped-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -373,6 +393,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -614,7 +643,9 @@ typedef struct PartitionSchemeData *PartitionScheme;
* the set of RT indexes for its component baserels, along with RT indexes
* for any outer joins it has computed. We create RelOptInfo nodes for each
* baserel and joinrel, and store them in the PlannerInfo's simple_rel_array
- * and join_rel_list respectively.
+ * and join_rel_list respectively. We also create RelOptInfo nodes for each
+ * grouped relation when eager aggregation is enabled, and store them in the
+ * PlannerInfo's grouped_rel_list.
*
* Note that there is only one joinrel for any given set of component
* baserels, no matter what order we assemble them in; so an unordered
@@ -679,7 +710,10 @@ typedef struct PartitionSchemeData *PartitionScheme;
* cheapest_unique_path - for caching cheapest path to produce unique
* (no duplicates) output from relation; NULL if not yet requested
* cheapest_parameterized_paths - best paths for their parameterizations;
- * always includes cheapest_total_path, even if that's unparameterized
+ * always includes cheapest_total_path, even if that's unparameterized;
+ * in the grouped relation case, always includes the unparameterized
+ * path with the fewest rows, if there is one and it is not
+ * cheapest_total_path
* direct_lateral_relids - rels this rel has direct LATERAL references to
* lateral_relids - required outer rels for LATERAL, as a Relids set
* (includes both direct and indirect lateral references)
@@ -998,6 +1032,12 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1071,6 +1111,68 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3144,6 +3246,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 719be3897f..7747fb3397 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -313,10 +313,16 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -352,4 +358,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 46955d128f..5e9d9597b9 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index fee3378bbe..9fc4550158 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -75,6 +75,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..9f63472eff
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1308 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+------+-------
+ 0 | 500 | 100
+ 6 | 1100 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+ x | sum | count
+----+------+-------
+ 2 | 600 | 50
+ 4 | 1200 | 50
+ 8 | 900 | 50
+ 12 | 600 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+-------
+ 0 | 10000
+ 2 | 14000
+ 4 | 18000
+ 6 | 22000
+ 8 | 26000
+ 10 | 10000
+ 12 | 14000
+ 14 | 18000
+ 16 | 22000
+ 18 | 26000
+ 20 | 10000
+ 22 | 14000
+ 24 | 18000
+ 26 | 22000
+ 28 | 26000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Incremental Sort
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Sort Key: t2_3.x, t3_3.y
+ Presorted Key: t2_3.x
+ -> Merge Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Merge Cond: (t2_3.x = t3_3.x)
+ -> Sort
+ Output: t2_3.y, t2_3.x
+ Sort Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Sort
+ Output: t3_3.y, t3_3.x
+ Sort Key: t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t2_5.x, t3_5.y, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 91089ac215..6370504377 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -171,7 +172,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1edd9e45eb..4fc210e2ef 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..4050e4df44
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,192 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eb93debe10..af551da13e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -41,6 +41,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1066,6 +1067,7 @@ GrantTargetType
Group
GroupByOrdering
GroupClause
+GroupExprInfo
GroupPath
GroupPathExtraData
GroupResultPath
@@ -1298,7 +1300,6 @@ Join
JoinCostWorkspace
JoinDomain
JoinExpr
-JoinHashEntry
JoinPath
JoinPathExtraData
JoinState
@@ -2384,13 +2385,17 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
+RelHashEntry
RelIdCacheEnt
RelIdToTypeIdCacheEntry
RelInfo
RelInfoArr
+RelInfoList
+RelInfoListInfo
RelMapFile
RelMapping
RelOptInfo
--
2.43.0
On Sun, Jan 12, 2025 at 9:04 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is an updated version of this patch that addresses Jian's
review comments, along with some more cosmetic tweaks. I'm going to
be looking at this patch again from the point of view of committing
it, with the plan to commit it late this week or early next week,
barring any further comments or objections.
I feel this is rushed. This is a pretty big patch touching a sensitive
area of the code. I'm the only senior hacker who has reviewed it, and
I would say that I've only reviewed it pretty lightly, and that the
concerns I raised were fairly substantial. I don't think it's
customary to go from that point to commit after one more patch
revision. This really deserves to be looked at by multiple senior
hackers familiar with the planner; or at least by Tom.
My core concerns here are still what they were in the first email I
posted to the thread: it's unclear that the cost model will deliver
meaningful answers for the grouped rels, and it doesn't seem like
you've done enough to limit the overhead of the feature.
With regard to the first, I reiterate that we are in general quite bad
at having meaningful statistics for anything above an aggregate, and
this patch greatly expands how much of a query could be above an
aggregate. I felt back in August when I did my first review, and still
feel now, that when faced with a query where aggregation could be done
at any of several levels, the chances of picking the right one are not
much better than random. Why do you think otherwise?
With regard to the second, I suggested several lines of thinking back
in August that could lead to limiting the number of grouped_rels that
we create, but it doesn't really look like much of anything has
changed. We're still creating a partially grouped rel for every
baserel in the query, and every joinrel in the query. I'm not very
happy with "let's just turn it off by default" as the answer to that
concern. A lot of people won't enable the feature, which will mean it
doesn't have much value to our users, and those who do will still see
a lot of overhead. Maybe I'm wrong, but I bet with some good
heuristics the planning cost of this could be reduced by an order of
magnitude or more. If that were done, we could imagine eventually (or
maybe even immediately) enabling this by default; without that, we
still have the burden of maintaining this code and keeping it working,
but almost nobody will benefit.
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>off</literal>.
I'm a bit concerned about the naming here. I feel like we're adding an
increasing number of planner features with an increasing number of
disabling GUCs that are all a bit random. I kind of wonder if this
should be called enable_incremental_aggregate. Maybe that's worse,
because "eager" is a word we're not using for anything yet, so using
it here improves greppability and perhaps understandability. On the
other hand, the aggregate that is pushed down by this feature is
always partial (I believe) so we still need a finalize step later,
which means we're aggregating incrementally. There's some nice parity
with incremental sort, too, perhaps.
+/* The original length and hashtable of a RelInfoList */
+typedef struct
+{
+ int savelength;
+ struct HTAB *savehash;
+} RelInfoListInfo;
Both the comment and the name of the data type are completely meaningless.
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
This would be the fifth copy of this comment. It's not entirely this
patch's fault, of course, but some kind of refactoring or cleanup is
probably needed here.
+ * cheapest_parameterized_paths also always includes the fewest-row
+ * unparameterized path, if there is one, for grouped relations. Different
+ * paths of a grouped relation can have very different row counts, and in some
+ * cases the cheapest-total unparameterized path may not be the one with the
+ * fewest row.
As I said back in October, this seems like mixing together in one
RelOptInfo paths that really belong to two different RelOptInfos.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Jan 15, 2025 at 12:07 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jan 12, 2025 at 9:04 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is an updated version of this patch that addresses Jian's
review comments, along with some more cosmetic tweaks. I'm going to
be looking at this patch again from the point of view of committing
it, with the plan to commit it late this week or early next week,
barring any further comments or objections.I feel this is rushed. This is a pretty big patch touching a sensitive
area of the code. I'm the only senior hacker who has reviewed it, and
I would say that I've only reviewed it pretty lightly, and that the
concerns I raised were fairly substantial. I don't think it's
customary to go from that point to commit after one more patch
revision. This really deserves to be looked at by multiple senior
hackers familiar with the planner; or at least by Tom.
Thank you for your input. In fact, there have been several changes
since your last review, as I mentioned in the off-list email.
However, I agree that it would be great if someone else, especially
Tom, could take a look at this patch.
My core concerns here are still what they were in the first email I
posted to the thread: it's unclear that the cost model will deliver
meaningful answers for the grouped rels, and it doesn't seem like
you've done enough to limit the overhead of the feature.With regard to the first, I reiterate that we are in general quite bad
at having meaningful statistics for anything above an aggregate, and
this patch greatly expands how much of a query could be above an
aggregate. I felt back in August when I did my first review, and still
feel now, that when faced with a query where aggregation could be done
at any of several levels, the chances of picking the right one are not
much better than random. Why do you think otherwise?
I understand that we're currently quite bad at estimating the number
of groups after aggregation. In fact, it's not just aggregation
estimates — we're also bad at join estimates in some cases. This is a
reality we have to face. Here's what I think: we should be trying our
best to cost each node type as accurately as possible, and then build
the upper nodes based on those costs. We should not conclude that,
because we are unable to accurately cost one node type, we should
avoid any cost-based optimizations above that node.
Actually, performing aggregation before joins is not a new concept;
consider JOIN_UNIQUE_OUTER/INNER, for example:
explain (costs off)
select * from t t1 join t t2 on t1.b = t2.b
where (t1.a, t1.b) in
(select t3.a, t3.b from t t3, t t4 where t3.a > t4.a);
QUERY PLAN
------------------------------------------------------
Hash Join
Hash Cond: ((t2.b = t1.b) AND (t3.a = t1.a))
-> Hash Join
Hash Cond: (t2.b = t3.b)
-> Seq Scan on t t2
-> Hash
-> HashAggregate
Group Key: t3.a, t3.b
-> Nested Loop
Join Filter: (t3.a > t4.a)
-> Seq Scan on t t3
-> Materialize
-> Seq Scan on t t4
-> Hash
-> Seq Scan on t t1
(15 rows)
I believe the HashAggregate node in this plan faces the same problem
with inaccurate estimates. However, I don't think it's reasonable to
say that, because we cannot accurately cost the Aggregate node, we
should disregard considering JOIN_UNIQUE_OUTER/INNER.
Back in August, I responded to this issue by "Maybe we can run some
benchmarks first and investigate the regressions discovered on a
case-by-case basis". In October, I ran the TPC-DS benchmark at scale
10 and observed that eager aggregation was applied in 7 queries, with
no notable regressions. In contrast, Q4 and Q11 showed performance
improvements of 3–4 times. Please see [1]/messages/by-id/CAMbWs49DrR8Gkp3TUwFJV_1ShtmLzQUq3mOYD+GyF+Y3AmmrFw@mail.gmail.com.
With regard to the second, I suggested several lines of thinking back
in August that could lead to limiting the number of grouped_rels that
we create, but it doesn't really look like much of anything has
changed. We're still creating a partially grouped rel for every
baserel in the query, and every joinrel in the query. I'm not very
happy with "let's just turn it off by default" as the answer to that
concern. A lot of people won't enable the feature, which will mean it
doesn't have much value to our users, and those who do will still see
a lot of overhead. Maybe I'm wrong, but I bet with some good
heuristics the planning cost of this could be reduced by an order of
magnitude or more. If that were done, we could imagine eventually (or
maybe even immediately) enabling this by default; without that, we
still have the burden of maintaining this code and keeping it working,
but almost nobody will benefit.
Actually, I introduced the EAGER_AGGREGATE_RATIO mechanism in October
to limit the planning effort for eager aggregation. For more details,
please see [2]/messages/by-id/CAMbWs48OS3Z0G5u3fhar1=H_ucmEcUaX0tRUNpcLQxHt=z4Y7w@mail.gmail.com.
And I don't think it's correct to say that we create a partially
grouped rel for every baserel and every joinrel. This patch includes
a bunch of logic to determine whether it's appropriate to create a
grouped rel for a base or join rel. Furthermore, with the
EAGER_AGGREGATE_RATIO mechanism, even if creating a grouped rel is
possible, we will skip it if the grouped paths are considered not
useful. All of these measures help reduce the number of grouped
paths as well as the grouped relations in many cases where eager
aggregation would not help a lot.
Based on the TPC-DS benchmark results, I don't see "a lot of overhead"
in the planning cost, at least for the 7 queries where eager
aggregation is applied. As I said in [1]/messages/by-id/CAMbWs49DrR8Gkp3TUwFJV_1ShtmLzQUq3mOYD+GyF+Y3AmmrFw@mail.gmail.com, "For the planning time, I
do not see notable regressions for any of the seven queries". In
fact, I initially thought that we might consider enabling this by
default, given the positive benchmark results, but I just couldn't
summon the courage to do it. Perhaps we should reconsider enabling it
by default, so users can benefit from the new feature and help
identify any potential bugs.
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>) + <para> + Enables or disables the query planner's ability to partially push + aggregation past a join, and finalize it once all the relations are + joined. The default is <literal>off</literal>.I'm a bit concerned about the naming here. I feel like we're adding an
increasing number of planner features with an increasing number of
disabling GUCs that are all a bit random. I kind of wonder if this
should be called enable_incremental_aggregate. Maybe that's worse,
because "eager" is a word we're not using for anything yet, so using
it here improves greppability and perhaps understandability. On the
other hand, the aggregate that is pushed down by this feature is
always partial (I believe) so we still need a finalize step later,
which means we're aggregating incrementally. There's some nice parity
with incremental sort, too, perhaps.
As I mentioned in [3]/messages/by-id/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com, the name "Eager Aggregation" is inherited from
the paper "Eager Aggregation and Lazy Aggregation" [4]https://www.vldb.org/conf/1995/P345.PDF, from which
many of the ideas in this feature are derived. Personally, I like
this name a lot, but I'm open to other names if others find it
unreasonable.
+/* The original length and hashtable of a RelInfoList */ +typedef struct +{ + int savelength; + struct HTAB *savehash; +} RelInfoListInfo;Both the comment and the name of the data type are completely meaningless.
Thanks. Will fix the comment and the name for this data type.
+ /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */This would be the fifth copy of this comment. It's not entirely this
patch's fault, of course, but some kind of refactoring or cleanup is
probably needed here.
Agreed. However, I think it would be better to refactor this in a
separate patch. This issue also exists on master, and I'd prefer to
avoid introducing such refactors in this already large patch.
+ * cheapest_parameterized_paths also always includes the fewest-row + * unparameterized path, if there is one, for grouped relations. Different + * paths of a grouped relation can have very different row counts, and in some + * cases the cheapest-total unparameterized path may not be the one with the + * fewest row.As I said back in October, this seems like mixing together in one
RelOptInfo paths that really belong to two different RelOptInfos.
I understand that you said about the design in October where
"PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate
RelOptInfos", because "it's less clear whether it's fair to compare
across the two categories". I've shared my thoughts on this in [5]/messages/by-id/CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com.
Furthermore, even if we separate these grouped paths into two
different RelOptInfos, we still face the issue that "different paths
of a grouped relation can have very different row counts", and we need
a way to handle this. One could argue that we can separate the
grouped paths where partial aggregation is placed at different
locations into different RelOptInfos, but this would lead to an
explosion in the number of RelOptInfos for grouped relations as we
climb up the join tree. I think this is neither realistic nor
necessary.
[1]: /messages/by-id/CAMbWs49DrR8Gkp3TUwFJV_1ShtmLzQUq3mOYD+GyF+Y3AmmrFw@mail.gmail.com
[2]: /messages/by-id/CAMbWs48OS3Z0G5u3fhar1=H_ucmEcUaX0tRUNpcLQxHt=z4Y7w@mail.gmail.com
[3]: /messages/by-id/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
[4]: https://www.vldb.org/conf/1995/P345.PDF
[5]: /messages/by-id/CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com
Thanks
Richard
On Wed, Jan 15, 2025 at 1:58 AM Richard Guo <guofenglinux@gmail.com> wrote:
I understand that we're currently quite bad at estimating the number
of groups after aggregation. In fact, it's not just aggregation
estimates — we're also bad at join estimates in some cases. This is a
reality we have to face. Here's what I think: we should be trying our
best to cost each node type as accurately as possible, and then build
the upper nodes based on those costs. We should not conclude that,
because we are unable to accurately cost one node type, we should
avoid any cost-based optimizations above that node.
Well, I agree with that last sentence, for sure. But I don't think
it's true that the situations with joins and aggregates are
comparable. We are much better able to estimate the number of rows
that will come out of a join than we are to estimate the number of
rows that come out of an aggregate. It's certainly true that in some
cases we get join estimates badly wrong, and I'd like to see us do
better there, but our estimates of the number of distinct values that
exist in a column are the least reliable part of our statistics system
by far.
Also, we look at the underlying statistics for a column variable even
after joins and aggregates and assume (not having any other
information) that the distribution after that operation is likely to
be similar to the distribution before that operation. Consider a table
A with columns x and y. Let's say x is a unique ID and y is a
dependent value with some distribution over a finite range of
possibilities (e.g. a person's age). If we join table A to some other
table B on A.x = B.x and filter out some of the rows via that join,
the distribution of values in column y is likely to be altered. If the
rows are removed at random, the original distribution will prevail,
but often it won't be random and so the distribution will change in a
way we can't predict. However, guessing pre-join distribution of A.y
is still prevails isn't crazy, and it's better than assuming we can
say nothing about the distribution.
But now let's say that after joining to B, we perform an aggregation
operation, computing the minimum value of A.y for each value of B.z. A
this point, we have no usable statistics for either output column. The
result must be unique on B.z, and the distribution of MIN(A.y) is
going to be entirely different from the distribution of B.y. Any
future joins that we perform here will have to be estimated without
any MCVs, which is going to reduce the accuracy of the estimation by a
lot. In summary, the join makes relying on our MCV information less
likely to be accurate, but the aggregate makes it impossible to rely
on our MCV information at all. In terms of the accuracy of our
results, that is a lot worse.
I believe the HashAggregate node in this plan faces the same problem
with inaccurate estimates. However, I don't think it's reasonable to
say that, because we cannot accurately cost the Aggregate node, we
should disregard considering JOIN_UNIQUE_OUTER/INNER.
Fair point.
Back in August, I responded to this issue by "Maybe we can run some
benchmarks first and investigate the regressions discovered on a
case-by-case basis". In October, I ran the TPC-DS benchmark at scale
10 and observed that eager aggregation was applied in 7 queries, with
no notable regressions. In contrast, Q4 and Q11 showed performance
improvements of 3–4 times. Please see [1].
I had forgotten about that, and again, fair point, but I'm concerned
that it might not be a broad enough base of queries to test against.
(7 isn't a very large number.)
Actually, I introduced the EAGER_AGGREGATE_RATIO mechanism in October
to limit the planning effort for eager aggregation. For more details,
please see [2].
OK, I missed this, but...
And I don't think it's correct to say that we create a partially
grouped rel for every baserel and every joinrel. This patch includes
a bunch of logic to determine whether it's appropriate to create a
grouped rel for a base or join rel. Furthermore, with the
EAGER_AGGREGATE_RATIO mechanism, even if creating a grouped rel is
possible, we will skip it if the grouped paths are considered not
useful. All of these measures help reduce the number of grouped
paths as well as the grouped relations in many cases where eager
aggregation would not help a lot.
...it looks to me like EAGER_AGGREGATE_RATIO is used to set the
RelAggInfo's agg_useful field, which seems like it happens after the
RelOptInfo has already been created. I had been looking for something
that would avoid creating the RelOptInfo in the first place and I
didn't see it.
Based on the TPC-DS benchmark results, I don't see "a lot of overhead"
in the planning cost, at least for the 7 queries where eager
aggregation is applied. As I said in [1], "For the planning time, I
do not see notable regressions for any of the seven queries". In
fact, I initially thought that we might consider enabling this by
default, given the positive benchmark results, but I just couldn't
summon the courage to do it. Perhaps we should reconsider enabling it
by default, so users can benefit from the new feature and help
identify any potential bugs.
If you're going to commit this, I think it would be a good idea to
enable it by default at least for now. If there are problems, it's
better to find out about them sooner rather than later. If they are
minor they can be fixed; if they are major, we can consider whether it
is better to fix them, disable the feature by default, or revert. We
can add an open item to reconsider the default setting during beta.
As I said back in October, this seems like mixing together in one
RelOptInfo paths that really belong to two different RelOptInfos.I understand that you said about the design in October where
"PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate
RelOptInfos", because "it's less clear whether it's fair to compare
across the two categories". I've shared my thoughts on this in [5].Furthermore, even if we separate these grouped paths into two
different RelOptInfos, we still face the issue that "different paths
of a grouped relation can have very different row counts", and we need
a way to handle this. One could argue that we can separate the
grouped paths where partial aggregation is placed at different
locations into different RelOptInfos, but this would lead to an
explosion in the number of RelOptInfos for grouped relations as we
climb up the join tree. I think this is neither realistic nor
necessary.
It's possible you're right, but it does make me nervous. I do agree
that making the number of RelOptInfos explode would be really bad.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Jan 15, 2025 at 11:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 15, 2025 at 1:58 AM Richard Guo <guofenglinux@gmail.com> wrote:
I understand that we're currently quite bad at estimating the number
of groups after aggregation. In fact, it's not just aggregation
estimates — we're also bad at join estimates in some cases. This is a
reality we have to face. Here's what I think: we should be trying our
best to cost each node type as accurately as possible, and then build
the upper nodes based on those costs. We should not conclude that,
because we are unable to accurately cost one node type, we should
avoid any cost-based optimizations above that node.Well, I agree with that last sentence, for sure. But I don't think
it's true that the situations with joins and aggregates are
comparable. We are much better able to estimate the number of rows
that will come out of a join than we are to estimate the number of
rows that come out of an aggregate. It's certainly true that in some
cases we get join estimates badly wrong, and I'd like to see us do
better there, but our estimates of the number of distinct values that
exist in a column are the least reliable part of our statistics system
by far.
I totally understand that the situation with joins is better than with
aggregates, which is why I said that we're also bad at join estimates
"in some cases" - especially in the cases where we fall back to use
default selectivity estimates. A simple example:
create table t1 (a int, b int);
create table t2 (a int, b int);
insert into t1 select i, i from generate_series(1,1000)i;
insert into t2 select i, i from generate_series(1000, 1999)i;
analyze t1, t2;
explain analyze select * from t1 join t2 on t1.a > t2.a;
And here is what I got:
Nested Loop (cost=0.00..15032.50 rows=333333 width=16)
(actual time=392.841..392.854 rows=0 loops=1)
If this t1/t2 join is part of a larger SELECT query, I think the cost
estimates for the upper join nodes would likely be quite inaccurate.
I believe the HashAggregate node in this plan faces the same problem
with inaccurate estimates. However, I don't think it's reasonable to
say that, because we cannot accurately cost the Aggregate node, we
should disregard considering JOIN_UNIQUE_OUTER/INNER.Fair point.
Back in August, I responded to this issue by "Maybe we can run some
benchmarks first and investigate the regressions discovered on a
case-by-case basis". In October, I ran the TPC-DS benchmark at scale
10 and observed that eager aggregation was applied in 7 queries, with
no notable regressions. In contrast, Q4 and Q11 showed performance
improvements of 3–4 times. Please see [1].I had forgotten about that, and again, fair point, but I'm concerned
that it might not be a broad enough base of queries to test against.
(7 isn't a very large number.)
Yeah, I know 7 is not a large number, but this is the result I got
from running the TPC-DS benchmark. For the remaining 92 queries in
the benchmark, either the logic in this patch determines that eager
aggregation is not applicable, or the path with eager aggregation is
not the optimal one. I'd be more than happy if a benchmark query
showed significant performance regression, so it would provide an
opportunity to investigate how the cost estimates are negatively
impacting the final plan and explore ways to avoid or improve that.
If anyone can provide such a benchmark query, I'd be very grateful.
Perhaps this is another reason why we should enable this feature by
default, so we can identify such regression issues sooner rather than
later.
Actually, I introduced the EAGER_AGGREGATE_RATIO mechanism in October
to limit the planning effort for eager aggregation. For more details,
please see [2].OK, I missed this, but...
And I don't think it's correct to say that we create a partially
grouped rel for every baserel and every joinrel. This patch includes
a bunch of logic to determine whether it's appropriate to create a
grouped rel for a base or join rel. Furthermore, with the
EAGER_AGGREGATE_RATIO mechanism, even if creating a grouped rel is
possible, we will skip it if the grouped paths are considered not
useful. All of these measures help reduce the number of grouped
paths as well as the grouped relations in many cases where eager
aggregation would not help a lot....it looks to me like EAGER_AGGREGATE_RATIO is used to set the
RelAggInfo's agg_useful field, which seems like it happens after the
RelOptInfo has already been created. I had been looking for something
that would avoid creating the RelOptInfo in the first place and I
didn't see it.
Well, from the perspective of planning effort, what really matters is
whether the RelOptInfo for the grouped relation is added to the
PlannerInfo, as it is only then available for further joining in the
join search routine, not whether the RelOptInfo is built or not.
Building the RelOptInfo for a grouped relation is simply a makeNode
call followed by a flat copy; it doesn't require going through the
full process of determining its target list, or constructing its
restrict and join clauses, or calculating size estimates, etc.
Now, let's take a look at how the EAGER_AGGREGATE_RATIO mechanism is
used. As you mentioned, EAGER_AGGREGATE_RATIO is used to set the
agg_useful field of the RelAggInfo. For a base rel where we've
decided that aggregation can be pushed down, if agg_useful is false,
we skip building the grouped relation for it in the first place, not
to mention adding the grouped relation to the PlannerInfo. For a join
rel where aggregation can be pushed down, if agg_useful is false, we
will create a temporary RelOptInfo for its grouped relation, but we
only add this RelOptInfo to the PlannerInfo if we can generate any
grouped paths by joining its input relations. We could easily modify
make_grouped_join_rel() to create this temporary RelOptInfo only when
needed, but I'm not sure if that's necessary, since I don't have data
to suggest that the creation of this temporary RelOptInfo is a factor
in causing planning regressions.
Based on the TPC-DS benchmark results, I don't see "a lot of overhead"
in the planning cost, at least for the 7 queries where eager
aggregation is applied. As I said in [1], "For the planning time, I
do not see notable regressions for any of the seven queries". In
fact, I initially thought that we might consider enabling this by
default, given the positive benchmark results, but I just couldn't
summon the courage to do it. Perhaps we should reconsider enabling it
by default, so users can benefit from the new feature and help
identify any potential bugs.If you're going to commit this, I think it would be a good idea to
enable it by default at least for now. If there are problems, it's
better to find out about them sooner rather than later. If they are
minor they can be fixed; if they are major, we can consider whether it
is better to fix them, disable the feature by default, or revert. We
can add an open item to reconsider the default setting during beta.
Agreed. And I like the suggestion of adding an open item about the
default setting during beta.
As I said back in October, this seems like mixing together in one
RelOptInfo paths that really belong to two different RelOptInfos.I understand that you said about the design in October where
"PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate
RelOptInfos", because "it's less clear whether it's fair to compare
across the two categories". I've shared my thoughts on this in [5].Furthermore, even if we separate these grouped paths into two
different RelOptInfos, we still face the issue that "different paths
of a grouped relation can have very different row counts", and we need
a way to handle this. One could argue that we can separate the
grouped paths where partial aggregation is placed at different
locations into different RelOptInfos, but this would lead to an
explosion in the number of RelOptInfos for grouped relations as we
climb up the join tree. I think this is neither realistic nor
necessary.It's possible you're right, but it does make me nervous. I do agree
that making the number of RelOptInfos explode would be really bad.
Based on my explanation in [1]/messages/by-id/CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com, I think it's acceptable to compare
grouped paths for the same grouped rel, regardless of where the
partial aggregation is placed.
I fully understand that I could be wrong about this, but I don't think
it would break anything in regular planning (i.e., planning without
eager aggregation). We would never compare a grouped path with a
non-grouped path during scan/join planning. As far as I can see, the
only consequence in that case would be that we might fail to select
the optimal grouped path and miss out on fully leveraging the benefits
of eager aggregation.
Back in November, I considered the possibility of introducing a
GroupPathInfo into the Path structure to store the location of the
partial aggregation as well as the estimated rowcount for this grouped
path, similar to how ParamPathInfo functions for parameterized paths.
However, after some exploration, I determined that this was
unnecessary.
But in any case, I don't think it's an option to separate the grouped
paths of the same grouped relation into different RelOptInfos based on
the location of the partial aggregation within the path tree.
[1]: /messages/by-id/CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com
Thanks
Richard
I'm very sorry for not having had any time to look at this patch
before --- it's been on my radar screen for awhile, but $LIFE has
been rather demanding lately.
Anyway, I've now read through the mail thread and portions of the
v16 patch, and I have to concur with Robert's qualms about whether
this is ready. A few observations:
* The README addition, and the basically identical text in the
commit message, don't even provide a reason to believe that the
transformation is correct let alone that it will result in faster
execution. I don't understand why it's so hard to provide a solid
correctness argument. This work was supposedly based on an academic
paper; surely that paper must have included a correctness proof?
PG might need a few refinements, like being specific about what we
expect from the equality operators. But an EXPLAIN plan is not an
argument.
* As for the performance aspect, we're given
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)
As far as I can see, this will require aggregation to be performed
across every row of "b", whereas the naive way would have aggregated
across only rows having join partners in "a". If most "b" rows lack
a join partner then this will be far slower than the naive way.
I do see that it can be better if most "b" rows have multiple join
partners, because we'll re-use partial aggregation results instead
of (effectively) recalculating them. But the README text makes it
sound like this is an unconditional win, which is not the right
mindset. (In fact, in this specific example where a.i is presumed
unique, how's it a win at all?)
* I'm also concerned about what happens with aggregates that can have
large partial-aggregation values, such as string_agg(). With the
existing usage of partial aggregation for parallel queries, it's
possible to be confident that there are not many partial-aggregation
values in existence at the same time. I don't think that holds for
pushed-down aggregates: for example, I wouldn't be surprised if the
planner chooses a join plan that requires stuffing all those values
into a hash table, or "materializes" the output of the partial
aggregation step. Do we have logic that will avoid blowing out
memory during such queries?
* I am just as worried as Robert is about the notion of different
paths for the same RelOptInfo having different rowcount estimates.
That is an extremely fundamental violation of basic planner
assumptions. We did bend it for parameterized paths by restating
those assumptions as (from optimizer/README):
To keep cost estimation rules relatively simple, we make an implementation
restriction that all paths for a given relation of the same parameterization
(i.e., the same set of outer relations supplying parameters) must have the
same rowcount estimate. This is justified by insisting that each such path
apply *all* join clauses that are available with the named outer relations.
I don't see any corresponding statement here, and it's not clear
to me that the point has been thought through adequately.
Another aspect that bothers me is that a RelOptInfo is understood
to contain a bunch of paths that all yield the same data (the same
set of columns), and it seems like that might not be the case here.
Certainly partially-aggregated paths will output something different
than unaggregated ones, but mightn't different join orders mutate the
column set even further?
I think that we might be better off building a separate RelOptInfo for
each way of pushing down the aggregates, in order to preserve the
principle that all the paths in any one RelOptInfo have the same
output. This'll mean more RelOptInfos, but not more paths, so
I doubt it adds that much performance overhead.
Richard Guo <guofenglinux@gmail.com> writes:
Back in November, I considered the possibility of introducing a
GroupPathInfo into the Path structure to store the location of the
partial aggregation as well as the estimated rowcount for this grouped
path, similar to how ParamPathInfo functions for parameterized paths.
However, after some exploration, I determined that this was
unnecessary.
Why did you determine that was unnecessary? The principal function
of ParamPathInfo IMV is to ensure that we use exactly the same
rowcount estimate for all the paths that should have the same
estimate, and that problem seems to exist here as well. If you
don't have a forcing mechanism then paths' estimates will diverge
as a result of things like different roundoff errors in different
join sequences.
Anyway, I agree with Robert that this isn't ready. I don't feel
that I can even review it adequately without a lot better internal
documentation, specifically a clearer statement of what query shapes
the optimization applies to and what's the rationale for the
transformation being correct. The commentary in pathnodes.h for the
new data structures is likewise so skimpy as to be near useless.
regards, tom lane
On Fri, Jan 17, 2025 at 6:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
* The README addition, and the basically identical text in the
commit message, don't even provide a reason to believe that the
transformation is correct let alone that it will result in faster
execution. I don't understand why it's so hard to provide a solid
correctness argument. This work was supposedly based on an academic
paper; surely that paper must have included a correctness proof?
PG might need a few refinements, like being specific about what we
expect from the equality operators. But an EXPLAIN plan is not an
argument.
Thank you for taking a look at this patch!
In README, I provided the justification for the correctness of this
transformation as follows:
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
'destiny', which is crucial for maintaining correctness.
I believed that this explanation would make it clear why this
transformation is correct.
Yeah, this work implements one of the transformations introduced in
paper "Eager Aggregation and Lazy Aggregation". In the paper, Section
4 presents the formalism, Section 5 proves the main theorem, and
Section 6 introduces corollaries related to this specific
transformation. I'm just not sure how to translate these theorems and
corollaries into natural language that would be suitable to be
included in the README. I can give it another try if you find the
above justification not clear enough, but it would be really helpful
if I could get some assistance with this.
And I'd like to clarify that the EXPLAIN plan included in the README
is only meant to illustrate how this transformation looks like, and is
not intended to serve as an argument for its correctness.
* As for the performance aspect, we're given
Finalize HashAggregate
Group Key: a.i
-> Nested Loop
-> Partial HashAggregate
Group Key: b.j
-> Seq Scan on b
-> Index Only Scan using a_pkey on a
Index Cond: (i = b.j)As far as I can see, this will require aggregation to be performed
across every row of "b", whereas the naive way would have aggregated
across only rows having join partners in "a".
Yes, that's correct.
If most "b" rows lack
a join partner then this will be far slower than the naive way.
No, this is not correct. The partial aggregation may reduce the
number of input rows to the join, and the resulting data reduction
could justify the cost of performing the partial aggregation. As an
example, please consider:
create table t1 (a int, b int, c int);
create table t2 (a int, b int, c int);
insert into t1 select i%3, i%3, i from generate_series(1,1000000)i;
insert into t2 select i%3+3, i%3+3, i from generate_series(1,1000000)i;
analyze t1, t2;
explain analyze
select sum(t2.c) from t1 join t2 on t1.b = t2.b group by t1.a;
So for this query, most (actually all) t2 rows lack a join partner.
Running it with and without eager aggregation, I got (best of 3):
-- with eager aggregation
Execution Time: 496.856 ms
-- without eager aggregation
Execution Time: 1723.844 ms
I do see that it can be better if most "b" rows have multiple join
partners, because we'll re-use partial aggregation results instead
of (effectively) recalculating them.
Not only because we'll re-use partial aggregation results, but also
(and perhaps more importantly) because the number of input rows to the
join could be significantly reduced.
But the README text makes it
sound like this is an unconditional win, which is not the right
mindset.
I'm sorry if the README text gives that impression. The README says:
If the partial aggregation on table B significantly reduces the number
of input rows, the join above will be much cheaper, leading to a more
efficient final plan.
Perhaps I should use "could" or "might" instead of "will" to make it
less misleading.
But as you can see from the implementation, the decision is entirely
based on cost, not on rules. There is no part of the code that ever
assumes this transformation is an unconditional win.
(In fact, in this specific example where a.i is presumed
unique, how's it a win at all?)
It seems to me that whether it's a win depends on whether b.j is a
column with low cardinality (i.e., relatively few unique values). I
don't really see how a.i being unique would change that. Please
see the example below:
create table a (i int primary key, x int);
create table b (j int, y int);
insert into a select i, i%3 from generate_series(1,10000)i;
insert into b select i%3, i from generate_series(1,10000)i;
analyze a, b;
set enable_eager_aggregate to off;
EXPLAIN (ANALYZE, COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i > b.j
GROUP BY a.i;
QUERY PLAN
--------------------------------------------------------------------------------------------------
HashAggregate (actual time=100257.254..100268.841 rows=10000 loops=1)
Group Key: a.i
Batches: 1 Memory Usage: 2193kB
Buffers: shared hit=133
-> Nested Loop (actual time=2.629..40849.630 rows=99990000 loops=1)
Buffers: shared hit=133
-> Seq Scan on b (actual time=0.450..10.066 rows=10000 loops=1)
Buffers: shared hit=45
-> Memoize (actual time=0.002..0.752 rows=9999 loops=10000)
Cache Key: b.j
Cache Mode: binary
Hits: 9997 Misses: 3 Evictions: 0 Overflows: 0
Memory Usage: 1055kB
Buffers: shared hit=88
-> Index Only Scan using a_pkey on a (actual
time=0.752..8.100 rows=9999 loops=3)
Index Cond: (i > b.j)
Heap Fetches: 0
Buffers: shared hit=88
Planning Time: 1.681 ms
Execution Time: 100273.011 ms
(19 rows)
set enable_eager_aggregate to on;
EXPLAIN (ANALYZE, COSTS OFF)
SELECT a.i, avg(b.y)
FROM a JOIN b ON a.i > b.j
GROUP BY a.i;
QUERY PLAN
--------------------------------------------------------------------------------------------
Finalize HashAggregate (actual time=77.701..90.680 rows=10000 loops=1)
Group Key: a.i
Batches: 1 Memory Usage: 2193kB
Buffers: shared hit=133
-> Nested Loop (actual time=27.586..52.352 rows=29997 loops=1)
Buffers: shared hit=133
-> Partial HashAggregate (actual time=27.408..27.419 rows=3 loops=1)
Group Key: b.j
Batches: 1 Memory Usage: 24kB
Buffers: shared hit=45
-> Seq Scan on b (actual time=0.173..3.767 rows=10000 loops=1)
Buffers: shared hit=45
-> Index Only Scan using a_pkey on a (actual
time=0.108..5.277 rows=9999 loops=3)
Index Cond: (i > b.j)
Heap Fetches: 0
Buffers: shared hit=88
Planning Time: 1.739 ms
Execution Time: 93.003 ms
(18 rows)
There is a performance improvement of ~1000 times, even though a.i is
unique.
# select 100273.011/93.003;
?column?
-----------------------
1078.1696396890422890
(1 row)
(I used 'a.i > b.j' instead of 'a.i = b.j' to make the performance
difference more noticeable. I believe this is fine, as it doesn't
undermine the fact that a.i is unique.)
* I'm also concerned about what happens with aggregates that can have
large partial-aggregation values, such as string_agg(). With the
existing usage of partial aggregation for parallel queries, it's
possible to be confident that there are not many partial-aggregation
values in existence at the same time. I don't think that holds for
pushed-down aggregates: for example, I wouldn't be surprised if the
planner chooses a join plan that requires stuffing all those values
into a hash table, or "materializes" the output of the partial
aggregation step. Do we have logic that will avoid blowing out
memory during such queries?
Good point! Thank you for bringing this up. I hadn't considered it
before, and it seems no one else has raised this issue. I'll look
into it.
* I am just as worried as Robert is about the notion of different
paths for the same RelOptInfo having different rowcount estimates.
That is an extremely fundamental violation of basic planner
assumptions. We did bend it for parameterized paths by restating
those assumptions as (from optimizer/README):To keep cost estimation rules relatively simple, we make an implementation
restriction that all paths for a given relation of the same parameterization
(i.e., the same set of outer relations supplying parameters) must have the
same rowcount estimate. This is justified by insisting that each such path
apply *all* join clauses that are available with the named outer relations.I don't see any corresponding statement here, and it's not clear
to me that the point has been thought through adequately.Another aspect that bothers me is that a RelOptInfo is understood
to contain a bunch of paths that all yield the same data (the same
set of columns), and it seems like that might not be the case here.
Certainly partially-aggregated paths will output something different
than unaggregated ones, but mightn't different join orders mutate the
column set even further?I think that we might be better off building a separate RelOptInfo for
each way of pushing down the aggregates, in order to preserve the
principle that all the paths in any one RelOptInfo have the same
output. This'll mean more RelOptInfos, but not more paths, so
I doubt it adds that much performance overhead.
Hmm, IIUC, this means that we would separate the grouped paths of the
same grouped relation into different RelOptInfos based on the location
of the partial aggregation within the path tree. Let's define the
"location" as the relids of the relation on top of which we place the
partial aggregation. For grouped relation {A B C D}, if we perform
some aggregation on C, we would end up with 8 diffent grouped paths:
{A B D PartialAgg(C)}
{B D PartialAgg(A C)}
{A D PartialAgg(B C)}
{A B PartialAgg(D C)}
{D PartialAgg(A B C)}
{B PartialAgg(A D C)}
{A PartialAgg(B D C)}
{PartialAgg(A B D C)}
That means we would need to create 8 RelOptInfos for this grouped
relation. If my math doesn't fail me, for a relation containing n
base rels, we would need to create 2^(n-1) different RelOptInfos.
When building grouped relation {A B C D E} by joining {A B C D} with
{E}, we would need to call make_grouped_join_rel() 8 times, each time
joining {E} with one of the 8 RelOptInfos mentioned above. And at
last, considering other join orders such as joining {A B C E} with
{D}, this new grouped relation would end up with 16 new RelOptInfos.
And then we proceed with building grouped relation {A B C D E F}, and
end up with 32 new RelOptInfos, and this process continues...
It seems to me that this doesn't only result in more RelOptInfos, it
can also lead to more paths. Consider two grouped paths, say P1 and
P2, for the same grouped relation, but with different locations of the
partial aggregation. Suppose P1 is cheaper, at least as well ordered,
generates no more rows, requires no outer rels not required by P2, and
is no less parallel-safe. If these two paths are kept in the same
RelOptInfo, P2 will be discarded and not considered in further
planning. However, if P1 and P2 are separated into different
RelOptInfos, and P2 happens to have survived the add_path() tournament
for the RelOptInfo it is in, then it will be considered in subsequent
planning steps.
So in any case, this doesn't seem like a feasible approach to me.
I also have some thoughts on grouped paths and parameterized paths,
but I've run out of time for today. I'll send a separate email.
I'm really glad you're taking a look at this patch. Thank you!
Thanks
Richard
On Thu, Jan 16, 2025 at 3:18 AM Richard Guo <guofenglinux@gmail.com> wrote:
If this t1/t2 join is part of a larger SELECT query, I think the cost
estimates for the upper join nodes would likely be quite inaccurate.
That's definitely true. However, the question is not whether the
planner has problems today (it definitely does) but whether it's OK to
make this change without improving our ability to estimate the effects
of aggregation operations. I understand that you (quite rightly) don't
want to get sucked into fixing unrelated planner problems, and I'm
also not sure to what extent these problems are actually fixable.
However, major projects sometimes require such work. For instance,
commit 5edc63bda68a77c4d38f0cbeae1c4271f9ef4100 was motivated by the
discovery that it was too easy to get a Parallel Bitmap Heap Scan plan
even when it wasn't best. The fact that the costing wasn't right
wasn't the fault of parallel query, but parallel query still needed to
do something about it to get good results.
Yeah, I know 7 is not a large number, but this is the result I got
from running the TPC-DS benchmark. For the remaining 92 queries in
the benchmark, either the logic in this patch determines that eager
aggregation is not applicable, or the path with eager aggregation is
not the optimal one. I'd be more than happy if a benchmark query
showed significant performance regression, so it would provide an
opportunity to investigate how the cost estimates are negatively
impacting the final plan and explore ways to avoid or improve that.
If anyone can provide such a benchmark query, I'd be very grateful.
Yes, having more people test this and look for regressions would be
quite valuable.
Well, from the perspective of planning effort, what really matters is
whether the RelOptInfo for the grouped relation is added to the
PlannerInfo, as it is only then available for further joining in the
join search routine, not whether the RelOptInfo is built or not.
Building the RelOptInfo for a grouped relation is simply a makeNode
call followed by a flat copy; it doesn't require going through the
full process of determining its target list, or constructing its
restrict and join clauses, or calculating size estimates, etc.
That's probably mostly true, but the overhead of memory allocations in
planner routines is not trivial. There are previous cases of changing
things or declining to change this purely on the number of palloc
cycles involved.
It's possible you're right, but it does make me nervous. I do agree
that making the number of RelOptInfos explode would be really bad.Based on my explanation in [1], I think it's acceptable to compare
grouped paths for the same grouped rel, regardless of where the
partial aggregation is placed.I fully understand that I could be wrong about this, but I don't think
it would break anything in regular planning (i.e., planning without
eager aggregation).
I think you might be taking too narrow a view of the problem. As Tom
says, the issue is that this breaks a bunch of assumptions that hold
elsewhere. One place that shows up in the patch is in the special-case
logic you've added to set_cheapest(), but I fear that won't be the end
of it. It seems a bit surprising to me that you didn't also need to
adjust add_path(), for example. Even if you don't, there's lots of
places that rely on the assumption that all paths for a RelOptInfo are
returning the same set of rows. If it turns out that a bunch of those
places need to be adjusted to work with this, then the code could
potentially end up quite messy, and that might also have performance
consequences, even when this feature is disabled. Many of the code
paths that deal with paths in the planner are quite hot.
To say that another way, I'm not so much worried about the possibility
that the patch contains a bug. Patches contain bugs all the time and
we can just fix them. It's not wonderful, but that's how software
development goes. What I am worried about is whether the architecture
is right. If we go with one RelOptInfo when the "right answer" is
many, or for that matter if we go with many when the right answer is
one, those are things that cannot be easily and reasonably patched
post-commit, and especially not post-release.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Jan 18, 2025 at 6:16 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 16, 2025 at 3:18 AM Richard Guo <guofenglinux@gmail.com> wrote:
If this t1/t2 join is part of a larger SELECT query, I think the cost
estimates for the upper join nodes would likely be quite inaccurate.That's definitely true. However, the question is not whether the
planner has problems today (it definitely does) but whether it's OK to
make this change without improving our ability to estimate the effects
of aggregation operations. I understand that you (quite rightly) don't
want to get sucked into fixing unrelated planner problems, and I'm
also not sure to what extent these problems are actually fixable.
However, major projects sometimes require such work. For instance,
commit 5edc63bda68a77c4d38f0cbeae1c4271f9ef4100 was motivated by the
discovery that it was too easy to get a Parallel Bitmap Heap Scan plan
even when it wasn't best. The fact that the costing wasn't right
wasn't the fault of parallel query, but parallel query still needed to
do something about it to get good results.
Yeah, it's true that we have problems in aggregate estimates today.
And it has been the case for a long time. In the past, we made some
improvements in this area, such as in 84f9a35e3, where we adapted a
new formula that is based on the random selection probability,
inspired by two papers from Yao and Dell'Era. But we still have
problems with aggregate estimates. I'm not sure when we could fix
these problems, but I doubt that it will happen in the near future.
(Sorry to be pessimistic.)
If, at last, the conclusion of this discussion is that we should not
apply this change until we fix those problems in aggregate estimates,
I'd be very sad. This conclusion is absolutely correct, for sure, in
an ideal world, but in the real world, it feels like a death sentence
for this patch, and for any future patches that attempt to apply some
optimizations above aggregate nodes - unless, of course, the day
arrives when we finally fix those aggregate estimate problems, which
doesn't seem likely in the near future.
And if that's the case, can I then argue that the same principle
should apply to joins? Specifically, should we refrain from applying
any optimizations above join nodes until we've fixed the join estimate
problems, particularly in cases where we fall back on default
selectivity estimates?
Please do not get me wrong. I'm not saying that we should not fix the
problems in our current aggregate estimates. I think, as I said
previously, that the realistic approach is to first identify some
real-world queries where this patch causes significant performance
regressions. This would give us the opportunity to investigate these
regressions and understand how the bad cost estimates contributed to
them. From there, we could figure out where to start fixing the cost
estimates. And if we find that the problem is not entirely fixable,
we could then explore the possibility of introducing new heuristics to
avoid the performance regressions as much as possible. In my opinion,
it's not very possible to make cost estimation perfect in all cases.
In a sense, cost estimation is an art of compromise.
I believe this is also the approach that commit 5edc63bda followed.
First, it was found that Bitmap Heap Scans caused performance
regressions in many TPCH queries in cases where work_mem was low.
Then, this issue was thoroughly discussed, and eventually it was
figured out that the impact of lossy pages needed to be accounted for
when estimating the cost of bitmap scans, which became 5edc63bda.
Well, from the perspective of planning effort, what really matters is
whether the RelOptInfo for the grouped relation is added to the
PlannerInfo, as it is only then available for further joining in the
join search routine, not whether the RelOptInfo is built or not.
Building the RelOptInfo for a grouped relation is simply a makeNode
call followed by a flat copy; it doesn't require going through the
full process of determining its target list, or constructing its
restrict and join clauses, or calculating size estimates, etc.That's probably mostly true, but the overhead of memory allocations in
planner routines is not trivial. There are previous cases of changing
things or declining to change this purely on the number of palloc
cycles involved.
Hmm, I think you are right. I will modify make_grouped_join_rel() to
create the RelOptInfo for a grouped join relation only if we can
generate any grouped paths by joining its input relations.
It's possible you're right, but it does make me nervous. I do agree
that making the number of RelOptInfos explode would be really bad.Based on my explanation in [1], I think it's acceptable to compare
grouped paths for the same grouped rel, regardless of where the
partial aggregation is placed.I fully understand that I could be wrong about this, but I don't think
it would break anything in regular planning (i.e., planning without
eager aggregation).I think you might be taking too narrow a view of the problem. As Tom
says, the issue is that this breaks a bunch of assumptions that hold
elsewhere. One place that shows up in the patch is in the special-case
logic you've added to set_cheapest(), but I fear that won't be the end
of it. It seems a bit surprising to me that you didn't also need to
adjust add_path(), for example. Even if you don't, there's lots of
places that rely on the assumption that all paths for a RelOptInfo are
returning the same set of rows. If it turns out that a bunch of those
places need to be adjusted to work with this, then the code could
potentially end up quite messy, and that might also have performance
consequences, even when this feature is disabled. Many of the code
paths that deal with paths in the planner are quite hot.
Yeah, one of the basic assumptions in the planner is that all paths
for a given RelOptInfo return the same set of rows. One exception
to this is parameterized paths. As an example, please consider:
create table t (a int, b int);
create table t3 (a int, b int);
insert into t select i, i from generate_series(1,1000)i;
insert into t3 select i, i from generate_series(1,1000)i;
create index on t3(a, b);
analyze t, t3;
explain (costs off)
select * from t t1 join t t2 on true join t3 on t3.a > t1.a and t3.b > t2.b;
With gdb, I found the following 4 paths in the pathlist of RelOptInfo
of {t3}:
{INDEXPATH
:path.pathtype 341
:parent_relids (b 4)
:required_outer (b 1 2)
:path.parallel_aware false
:path.parallel_safe true
:path.parallel_workers 0
:path.rows 111
:path.disabled_nodes 0
:path.startup_cost 0.275
:path.total_cost 4.755000000000001
{INDEXPATH
:path.pathtype 341
:parent_relids (b 4)
:required_outer (b 1)
:path.parallel_aware false
:path.parallel_safe true
:path.parallel_workers 0
:path.rows 333
:path.disabled_nodes 0
:path.startup_cost 0.275
:path.total_cost 6.1425
{INDEXPATH
:path.pathtype 341
:parent_relids (b 4)
:required_outer (b 2)
:path.parallel_aware false
:path.parallel_safe true
:path.parallel_workers 0
:path.rows 333
:path.disabled_nodes 0
:path.startup_cost 0.275
:path.total_cost 11.145
{PATH
:pathtype 338
:parent_relids (b 4)
:required_outer (b)
:parallel_aware false
:parallel_safe true
:parallel_workers 0
:rows 1000
:disabled_nodes 0
:startup_cost 0
:total_cost 15
None of them are returning the same set of rows. This is fine because
we have revised the assumption to that all paths for a RelOptInfo of
the same parameterization return the same set of rows. That is to
say, it's OK that paths for the same RelOptInfo return different sets
of rows if they have different parameterizations.
Now we have the grouped paths. I had previously considered further
revising this assumption to that all paths for a RelOptInfo of the
same parameterization and the same location of partial aggregation
return the same set of rows. That's why, back in November, I proposed
the idea of introducing a GroupPathInfo into the Path structure to
store the location of the partial aggregation and the estimated
rowcount for each grouped path, similar to how ParamPathInfo functions
for parameterized paths.
However, I gave up on this idea in December after realizing an
important difference from the parameterized path case. For a
parameterized path, the fewer the required outer rels, the better, as
more outer rels imply more join restrictions. In other words, the
number of required outer rels is an important factor when comparing
two paths in add_path(). For a grouped path, however, the location of
partial aggregation does not impose such restrictions for further
planning. As long as one grouped path is cheaper than another based
on the current merits of add_path(), we don't really care where the
partial aggregation is placed within the path tree.
I can take up the idea of GroupPathInfo again. Before I start
implementing it (which is not trivial), I'd like to hear others'
thoughts on this approach - whether it's necessary and whether this is
the right direction to pursue.
To say that another way, I'm not so much worried about the possibility
that the patch contains a bug. Patches contain bugs all the time and
we can just fix them. It's not wonderful, but that's how software
development goes. What I am worried about is whether the architecture
is right. If we go with one RelOptInfo when the "right answer" is
many, or for that matter if we go with many when the right answer is
one, those are things that cannot be easily and reasonably patched
post-commit, and especially not post-release.
Fair point. We should make sure the architecture of this patch is
solid before committing it.
Regarding whether we should use a single RelOptInfo or separate
RelOptInfos for the same grouped relation: If we choose to separate
the grouped paths of the same grouped relation into different
RelOptInfos based on the location of the partial aggregation within
the path tree, then, based on my calculation from the previous email,
for a relation containing n base rels, we would need to create 2^(n-1)
different RelOptInfos, not to mention that this can also lead to more
paths. I still struggle to see how this is feasible. Could you
please elaborate on why you believe this is a viable option?
Thanks
Richard
On Sun, Jan 19, 2025 at 7:53 AM Richard Guo <guofenglinux@gmail.com> wrote:
If, at last, the conclusion of this discussion is that we should not
apply this change until we fix those problems in aggregate estimates,
I'd be very sad. This conclusion is absolutely correct, for sure, in
an ideal world, but in the real world, it feels like a death sentence
for this patch, and for any future patches that attempt to apply some
optimizations above aggregate nodes - unless, of course, the day
arrives when we finally fix those aggregate estimate problems, which
doesn't seem likely in the near future.
Well, such conclusions should be based on evidence. So far, the
evidence you've presented suggests that the optimization works, so
there's no reason to leap to the conclusion that we shouldn't move
forward. On the other hand, the amount of evidence you've presented
does not seem to me to be all that large. And I'm not sure that you've
gone looking for adversarial cases.
And if that's the case, can I then argue that the same principle
should apply to joins? Specifically, should we refrain from applying
any optimizations above join nodes until we've fixed the join estimate
problems, particularly in cases where we fall back on default
selectivity estimates?
I am having a hard time figuring out how to write back to this. I
mean, I don't think that what you write here is a serious proposal,
and I think you already know that I was not proposing any such thing.
But it upsets me that you think that this hypothetical argument is
equivalent to the ones I've actually been making. Apparently, you
consider my concerns quite groundless and foolish.
Yeah, one of the basic assumptions in the planner is that all paths
for a given RelOptInfo return the same set of rows. One exception
to this is parameterized paths.
Good point. I had not considered this parallel.
Now we have the grouped paths. I had previously considered further
revising this assumption to that all paths for a RelOptInfo of the
same parameterization and the same location of partial aggregation
return the same set of rows. That's why, back in November, I proposed
the idea of introducing a GroupPathInfo into the Path structure to
store the location of the partial aggregation and the estimated
rowcount for each grouped path, similar to how ParamPathInfo functions
for parameterized paths.
Interesting.
However, I gave up on this idea in December after realizing an
important difference from the parameterized path case. For a
parameterized path, the fewer the required outer rels, the better, as
more outer rels imply more join restrictions. In other words, the
number of required outer rels is an important factor when comparing
two paths in add_path(). For a grouped path, however, the location of
partial aggregation does not impose such restrictions for further
planning. As long as one grouped path is cheaper than another based
on the current merits of add_path(), we don't really care where the
partial aggregation is placed within the path tree.I can take up the idea of GroupPathInfo again. Before I start
implementing it (which is not trivial), I'd like to hear others'
thoughts on this approach - whether it's necessary and whether this is
the right direction to pursue.
Yes, I would, too. Tom, do you have any thoughts on this point? Anybody else?
An advantage of this approach could be that it would avoid any
explosion in the number of RelOptInfo structures, since presumably all
the partially aggregated paths could be attached to the same
RelOptInfo as the unaggregated paths, just with a GroupPathInfo to
mark them as partially aggregated. I have to admit that I'm not sure
it was the right idea to mix parameterized and unparameterized paths
in the same path list, and I'm even less sure that it would be a good
idea to mix in partially-aggregated paths. That's because a
parameterized path behaves like a regular path with a join
order/method restriction: as long as we only create valid joins from
parameterized paths, we'll eventually end up with unparameterized
paths without doing anything else. A partially aggregated path behaves
more like a partial path, which requires a Gather or Gather Merge node
to terminate parallelism. Likewise, a partially aggregated path will
require a FinalizeAggregate step to complete the aggregation. Maybe
that's the wrong way of thinking about it, though, since the
FinalizeAggregate node must (I think) go at the top of the join tree,
whereas a Gather can go anywhere.
I felt it best when implementing parallel query to put partial paths
into a separate list, rather than mixing them into the regular path
list. I am vaguely under the impression that Tom thinks that was a
poor decision on my part. And I can sort of see that there is a
problem brewing here. If we handled this case like that one, then we'd
go from 2 lists to 4: normal paths, paths needing a FinalizeAggregate,
paths needing a Gather(Merge), paths needing both. And if we handled
one more future thing in the same way, then the number of combinations
doubles again to 8. Clearly, that way lies madness. On the other hand,
there's another kind of madness in thinking that we can just stick a
whole bunch of paths that are different from each other in an
increasing number of ways into a single path list and suffer no
adverse consequences. The growing complexity of add_path() is one
fairly obvious one.
So I don't quite know which way to jump here. It now seems to me that
we have three similar features with three different designs.
Parameterization added non-comparable paths to the same path list;
parallel query added them to a different path list in the same
RelOptInfo; and this patch currently adds them a separate RelOptInfo.
That's quite a bit of diversity. Really, if we wanted to stick
strictly to the idea of paths associated with the same RelOptInfo
being directly comparable, then parameterization should have spawned a
separate RelOptInfo for each workable parameterization, but that
wasn't done, possibly (though I'm not sure) for the same reasons that
you don't want to do it here.
Regarding whether we should use a single RelOptInfo or separate
RelOptInfos for the same grouped relation: If we choose to separate
the grouped paths of the same grouped relation into different
RelOptInfos based on the location of the partial aggregation within
the path tree, then, based on my calculation from the previous email,
for a relation containing n base rels, we would need to create 2^(n-1)
different RelOptInfos, not to mention that this can also lead to more
paths. I still struggle to see how this is feasible. Could you
please elaborate on why you believe this is a viable option?
I agree that creating an exponential number of RelOptInfos is not
going to work out well. I haven't been quite as certain as you seem to
be that it's an unavoidable reality, but maybe it is. For instance, my
intuition is that if PartialAgg(t1) JOIN t2 and PartialAgg(t1 JOIN t2)
produce very different numbers of rows, we could probably just take
the one with the smaller row count regardless of cost, because the
whole selling point of this optimization is that we reduce the number
of rows that are being fed to higher level plan nodes. I don't quite
see how it can make sense to keep a less costly path that produces
more rows on the theory that maybe it's going to work out better later
on. Why is the path cheaper, after all? It feels like the savings must
come from not reducing the row count so much, but that is a cost we're
going to have to repay at a higher plan level. Moreover, we'll be
repaying it with interest, because more rows will have filtered
through every level of plan over which we postponed partial
aggregation.
I admit it's not so clear-cut when the row counts are close. If
PartialAgg(t1 JOIN t2) JOIN t3 has a very similar to PartialAgg(t1
JOIN t3) JOIN t2, can we categorically pick whichever one has the
lower row count and forget about the other? I'm not sure. But I have
an uncomfortable feeling that if we can't, we're going to have an
explosion in the number of paths we have to generate even if we avoid
an explosion in the number of RelOptInfos we generate.
For example, consider:
SELECT ... FROM fact f, dim1, dim2, dim3, dim4
WHERE f.dim1_id = dim1.id AND f.dim2_id = dim2.id
AND f.dim3_id = dim3.id AND f.dim4_id = dim4.id
GROUP BY f.something;
Let's assume that each dimN table has PRIMARY KEY (id). Because of the
primary keys, it's only sensible to consider partial aggregation for
subsets of rels that include f; and it doesn't make sense to consider
partially aggregating after joining all 5 tables because at that point
we should just do a single-step aggregation. So, the partially
grouped-rel for {f,dim1,dim2,dim3,dim4} can contain paths generated in
15 different ways, because we can join f to any proper subset of
{dim1,dim2,dim3,dim4} before partially aggregating and then to the
remainder after partially aggregating. But that feels like we're
re-performing essentially the same join search 16 times which seems
super-expensive. I can't quite say that the work is useless or that I
have a better idea, but I guess there will be a lot of cases where all
16 join searches produce the same results, or most of them do. It
doesn't feel to me like checking through all of those possibilities is
a good expenditure of planner effort.
I took a look at the paper you linked in the original post, but
unfortunately it doesn't seem to say much about how to search the plan
space efficiently. I wonder if other systems perform a search that as
exhaustive as the one that you are proposing to perform here or
whether they apply some heuristics to limit the search space, and if
so, what those heuristics are.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
So I don't quite know which way to jump here. It now seems to me that
we have three similar features with three different designs.
Parameterization added non-comparable paths to the same path list;
parallel query added them to a different path list in the same
RelOptInfo; and this patch currently adds them a separate RelOptInfo.
Yeah, this. I don't think that either of those first two decisions
was wrong, but it does seem annoying that this patch wants to do it
yet a third way. Still, it may be the right thing. Bear with me a
moment:
We dealt with parameterized paths being in the same list as
non-parameterized paths by treating the set of parameter rels as a
figure-of-merit that add_path can compare. This works because if,
say, a nonparameterized path dominates a parameterized one on every
other figure of merit then there's no point in keeping the
parameterized one. It is squirrely that the parameterized paths
typically don't yield the same number of rows as others for the same
RelOptInfo, but at least so far that hasn't broken anything. I think
it's important that the parameterized paths do yield the same column
set as other paths for the rel; and the rows they do yield will be a
subset of the rows that nonparameterized paths yield.
On the other hand, it's not sensible for partial paths to compete
in an add_path tournament with non-partial ones. If they did, neither
group could be allowed to dominate the other group, so add_path would
just be wasting its time making those path comparisons. So I do think
it was right to put them in a separate path list. Importantly, they
generate the same column set and some subset of the same rows that
the non-partial ones do, which I think is what justifies putting
them into the same RelOptInfo.
However, a partial-aggregation path does not generate the same data
as an unaggregated path, no matter how fuzzy you are willing to be
about the concept. So I'm having a very hard time accepting that
it ought to be part of the same RelOptInfo, and thus I don't really
buy that annotating paths with a GroupPathInfo is the way forward.
What this line of analysis doesn't tell us though is whether paths
that did their partial aggregations at different join levels can be
considered as enough alike that they should compete on cost terms.
If they are, we need to put them into the same RelOptInfo. So
while I want to have separate RelOptInfos for partially aggregated
paths, I'm unclear on how many of those we need or what their
identifying property is.
Also: we avoid generating parameterized partial paths, because
combining those things would be too much of a mess. There's some
handwaving in the comments for add_partial_path to the effect that
it wouldn't be a win anyway, but I think the real reason is that
it'd be far too complicated for the potential value. Can we make
a similar argument for partial aggregation? I sure hope so.
I agree that creating an exponential number of RelOptInfos is not
going to work out well.
FWIW, I'm way more concerned about the number of Paths considered
than I am about the number of RelOptInfos. This relates to your
question about whether we want to use some heuristics to limit
the planner's search space.
regards, tom lane
On Mon, Jan 20, 2025 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
However, a partial-aggregation path does not generate the same data
as an unaggregated path, no matter how fuzzy you are willing to be
about the concept. So I'm having a very hard time accepting that
it ought to be part of the same RelOptInfo, and thus I don't really
buy that annotating paths with a GroupPathInfo is the way forward.
Seems like a fair argument. I'm not sure it's dispositive if practical
considerations merited the opposite treatment, but that doesn't seem
to be the case.
What this line of analysis doesn't tell us though is whether paths
that did their partial aggregations at different join levels can be
considered as enough alike that they should compete on cost terms.
If they are, we need to put them into the same RelOptInfo. So
while I want to have separate RelOptInfos for partially aggregated
paths, I'm unclear on how many of those we need or what their
identifying property is.Also: we avoid generating parameterized partial paths, because
combining those things would be too much of a mess. There's some
handwaving in the comments for add_partial_path to the effect that
it wouldn't be a win anyway, but I think the real reason is that
it'd be far too complicated for the potential value. Can we make
a similar argument for partial aggregation? I sure hope so.
I think your hopes will be dashed in this instance. Suppose we have:
SELECT m.mapped_value, SUM(g.summable_quantity)
FROM mapping_table m JOIN gigantic_dataset g
WHERE m.raw_value = g.raw_value GROUP BY 1;
If the mapping_table is small, we can do something like this:
FinalizeAggregate
-> Gather
-> PartialAggregate
-> Hash Join
But if mapping_table is big but relatively few of the keys appear as
raw values in gigantic_dataset, being able to do the PartialAggregate
before the join would be rather nice; and you wouldn't want to give up
on parallel query in such a case.
P.S. I'm not so sure you're right about the reason why this is
supported. We can create a partial path for a joinrel by picking a
partial path on one side and a non-partial path on the other side, so
we can get NestLoops below Gather just fine using the parameterized
paths that we're generating anyway to support non-parallel cases. But
what would the plan look like if we were using a partial,
parameterized path? That path would have to be on the inner side of a
nested loo, and as far as I can see it would need to have a Gather
node on top of it and below the Nested Loop, so you're talking about
something like this:
Nested Loop
-> Seq Scan on something
-> Gather
-> Nested Loop
-> Index Scan on otherthing
Index Cond: otherthing.x = something.x
-> Whatever Scan on whatever
But putting Gather on the inner side of a nested loop like that would
mean repeatedly starting up workers and shutting them down again which
seems no fun at all. If there's some way of using a partial,
parameterized path that doesn't involve sticking a Gather on the inner
side of a Nested Loop, then the technique might have some promise and
the existing comment (which I probably wrote) is likely bunk.
I agree that creating an exponential number of RelOptInfos is not
going to work out well.FWIW, I'm way more concerned about the number of Paths considered
than I am about the number of RelOptInfos. This relates to your
question about whether we want to use some heuristics to limit
the planner's search space.
I had that instinct, too, but I'm not 100% sure whether it was a
correct instinct. If we create too many Paths, it's possible that most
of them will be thrown away before we really do anything with them, in
which case they cost CPU cycles but there's no memory accumulation.
Mixing paths that perform the partial aggregation at different levels
into the same RelOptInfo also increases the chances that you're going
to throw away a lot of stuff early. On the other hand, if we create
too many RelOptInfos, that memory can't be freed until the end of the
planning cycle. If you wouldn't have minded waiting a long time for
the planner, but you do mind running out of memory, the second one is
worse. But of course, the best option is to consider neither too many
Paths nor too many RelOptInfos.
I have heard rumors that in some other systems, they decide on the
best serial plan first and then insert parallel operators afterward.
Something like that could potentially be done here, too: only explore
eager aggregation for join orders that are sub-parts of the best
non-eagerly-aggregated join order. But I am sort of hesitant to
propose it as a development direction because we've never done
anything like that before and I don't think it would be at all easy to
implement. Nonetheless, I can't help feeling like we're kidding
ourselves a little bit, not just with this patch but in general. We
talk about "pushing down" aggregates or sorts or operations that can
be done on foreign nodes, but that implies that we start with them at
the top and then try to move them downward. In fact, we consider them
everywhere and expect the pushed-down versions to win out on cost.
While that feels sensible to some degree, it means every major new
query planning technique tends to multiply the amount of planner work
we're doing rather than adding to it. I'm fairly sure that the best
parallel plan need not be a parallelized version of the best
non-parallel plan but it often is, and the more things parallelism
supports, the more likely it is that it will be (I think). With eager
aggregation, it feels like we're multiplying the number of times that
we replan the same join tree by a number that is potentially MUCH
larger than 2, yet it seems to me that much of that extra work is
unlikely to find anything. Even if we find a way to make it work here
without too much pain, I wonder what happens when the next interesting
optimization comes along. Multiplication by a constant greater than or
equal to two isn't an operation one can do too many times, generally
speaking, without ending up with a big number.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Jan 21, 2025 at 1:28 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Jan 19, 2025 at 7:53 AM Richard Guo <guofenglinux@gmail.com> wrote:
If, at last, the conclusion of this discussion is that we should not
apply this change until we fix those problems in aggregate estimates,
I'd be very sad. This conclusion is absolutely correct, for sure, in
an ideal world, but in the real world, it feels like a death sentence
for this patch, and for any future patches that attempt to apply some
optimizations above aggregate nodes - unless, of course, the day
arrives when we finally fix those aggregate estimate problems, which
doesn't seem likely in the near future.Well, such conclusions should be based on evidence. So far, the
evidence you've presented suggests that the optimization works, so
there's no reason to leap to the conclusion that we shouldn't move
forward. On the other hand, the amount of evidence you've presented
does not seem to me to be all that large. And I'm not sure that you've
gone looking for adversarial cases.And if that's the case, can I then argue that the same principle
should apply to joins? Specifically, should we refrain from applying
any optimizations above join nodes until we've fixed the join estimate
problems, particularly in cases where we fall back on default
selectivity estimates?I am having a hard time figuring out how to write back to this. I
mean, I don't think that what you write here is a serious proposal,
and I think you already know that I was not proposing any such thing.
But it upsets me that you think that this hypothetical argument is
equivalent to the ones I've actually been making. Apparently, you
consider my concerns quite groundless and foolish.
I'm really sorry if my previous response upset you or gave the wrong
impression. That was never my intention, and I certainly do not
consider your concerns to be groundless or foolish. I can see how my
message may have come across differently than I intended. To clarify,
I wasn't suggesting that your concerns about the estimates weren't
valid. Rather, I was trying to express that it might be more
effective to fix the cost estimates based on specific regressions.
Regarding whether we should use a single RelOptInfo or separate
RelOptInfos for the same grouped relation: If we choose to separate
the grouped paths of the same grouped relation into different
RelOptInfos based on the location of the partial aggregation within
the path tree, then, based on my calculation from the previous email,
for a relation containing n base rels, we would need to create 2^(n-1)
different RelOptInfos, not to mention that this can also lead to more
paths. I still struggle to see how this is feasible. Could you
please elaborate on why you believe this is a viable option?I agree that creating an exponential number of RelOptInfos is not
going to work out well. I haven't been quite as certain as you seem to
be that it's an unavoidable reality, but maybe it is. For instance, my
intuition is that if PartialAgg(t1) JOIN t2 and PartialAgg(t1 JOIN t2)
produce very different numbers of rows, we could probably just take
the one with the smaller row count regardless of cost, because the
whole selling point of this optimization is that we reduce the number
of rows that are being fed to higher level plan nodes. I don't quite
see how it can make sense to keep a less costly path that produces
more rows on the theory that maybe it's going to work out better later
on. Why is the path cheaper, after all? It feels like the savings must
come from not reducing the row count so much, but that is a cost we're
going to have to repay at a higher plan level. Moreover, we'll be
repaying it with interest, because more rows will have filtered
through every level of plan over which we postponed partial
aggregation.
I've been thinking about this proposal, and it's quite appealing. It
would significantly reduce both the planning effort and implementation
complexity, while still yielding reasonable planning results.
One concern I have with this proposal is that, as we climb up higher
and higher in the join tree, the assumption that a path with smaller
row count and higher cost is better than one with larger row count and
lower cost may gradually no longer hold. It's true that a path with a
smaller row count is generally better for upper join nodes, as it
feeds fewer rows to upper join nodes. However, as there are fewer and
fewer upper join nodes left, the efficiency gained from the smaller
row count could likely no longer justify the high cost of that path
itself.
Here's an example I found that can help illustrate what I mean.
create table t (a int, b int, c int);
insert into t select i%3, i%3, i from generate_series(1,500)i;
analyze t;
set enable_eager_aggregate to on;
And here are two plans for the same query:
-- Plan 1
explain (costs on)
select sum(t4.c) from t t1 join
(t t2 join t t3 on t2.b != t3.b join t t4 on t3.b = t4.b)
on t1.b = t2.b
group by t1.a;
QUERY PLAN
------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=4135.19..4135.22 rows=3 width=12)
Group Key: t1.a
-> Hash Join (cost=1392.13..3301.85 rows=166668 width=12)
Hash Cond: (t2.b = t1.b)
-> Nested Loop (cost=1377.88..1409.66 rows=1000 width=12)
Join Filter: (t2.b <> t3.b)
-> Partial HashAggregate (cost=1377.88..1377.91
rows=3 width=12)
Group Key: t3.b
-> Hash Join (cost=14.25..961.22 rows=83334 width=8)
Hash Cond: (t3.b = t4.b)
-> Seq Scan on t t3 (cost=0.00..8.00
rows=500 width=4)
-> Hash (cost=8.00..8.00 rows=500 width=8)
-> Seq Scan on t t4
(cost=0.00..8.00 rows=500 width=8)
-> Materialize (cost=0.00..10.50 rows=500 width=4)
-> Seq Scan on t t2 (cost=0.00..8.00 rows=500 width=4)
-> Hash (cost=8.00..8.00 rows=500 width=8)
-> Seq Scan on t t1 (cost=0.00..8.00 rows=500 width=8)
(17 rows)
-- Plan 2
explain (costs on)
select sum(t4.c) from t t1 join
(t t2 join t t3 on t2.b != t3.b join t t4 on t3.b = t4.b)
on t1.b = t2.b
group by t1.a;
QUERY PLAN
------------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=455675.44..455675.47 rows=3 width=12)
Group Key: t1.a
-> Hash Join (cost=455658.07..455672.94 rows=500 width=12)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..8.00 rows=500 width=8)
-> Hash (cost=455658.03..455658.03 rows=3 width=12)
-> Partial HashAggregate (cost=455658.00..455658.03
rows=3 width=12)
Group Key: t2.b
-> Hash Join (cost=14.25..316768.56
rows=27777887 width=8)
Hash Cond: (t3.b = t4.b)
-> Nested Loop (cost=0.00..3767.25
rows=166666 width=8)
Join Filter: (t2.b <> t3.b)
-> Seq Scan on t t2
(cost=0.00..8.00 rows=500 width=4)
-> Materialize (cost=0.00..10.50
rows=500 width=4)
-> Seq Scan on t t3
(cost=0.00..8.00 rows=500 width=4)
-> Hash (cost=8.00..8.00 rows=500 width=8)
-> Seq Scan on t t4
(cost=0.00..8.00 rows=500 width=8)
(17 rows)
For the grouped relation {t2 t3 t4}, Plan 1 chose the path
"PartialAgg(t3/t4) JOIN t2", while Plan 2 chose the path
"PartialAgg(t2/t3/t4)".
The first path has larger row count (1000) and lower cost (1409.66).
The second path has smaller row count (3) and higher cost (455658.03).
Executing these two plans shows that Plan 2 is slower than Plan 1.
-- Plan 1
Execution Time: 286.860 ms
-- Plan 2
Execution Time: 27109.744 ms
I think we may need to take the position in the join tree into account
when applying this heuristic. At lower levels, we should prefer paths
with smaller row counts, while at higher levels, we should prefer
paths with lower costs. However, it's unclear to me how we should
define "lower" and "higher" - how low is 'low' and how high is 'high'.
I admit it's not so clear-cut when the row counts are close. If
PartialAgg(t1 JOIN t2) JOIN t3 has a very similar to PartialAgg(t1
JOIN t3) JOIN t2, can we categorically pick whichever one has the
lower row count and forget about the other? I'm not sure. But I have
an uncomfortable feeling that if we can't, we're going to have an
explosion in the number of paths we have to generate even if we avoid
an explosion in the number of RelOptInfos we generate.For example, consider:
SELECT ... FROM fact f, dim1, dim2, dim3, dim4
WHERE f.dim1_id = dim1.id AND f.dim2_id = dim2.id
AND f.dim3_id = dim3.id AND f.dim4_id = dim4.id
GROUP BY f.something;Let's assume that each dimN table has PRIMARY KEY (id). Because of the
primary keys, it's only sensible to consider partial aggregation for
subsets of rels that include f; and it doesn't make sense to consider
partially aggregating after joining all 5 tables because at that point
we should just do a single-step aggregation. So, the partially
grouped-rel for {f,dim1,dim2,dim3,dim4} can contain paths generated in
15 different ways, because we can join f to any proper subset of
{dim1,dim2,dim3,dim4} before partially aggregating and then to the
remainder after partially aggregating. But that feels like we're
re-performing essentially the same join search 16 times which seems
super-expensive. I can't quite say that the work is useless or that I
have a better idea, but I guess there will be a lot of cases where all
16 join searches produce the same results, or most of them do. It
doesn't feel to me like checking through all of those possibilities is
a good expenditure of planner effort.
Yeah, you're right that the join search process for grouped paths
basically mirrors what we do for non-grouped paths, which indeed
involves a lot of planner effort. I've been exploring potential
heuristics to limit the search space for grouped paths, but so far, I
haven't found any effective solutions. Currently, the heuristic used
in the patch is to only consider grouped paths that dramatically
reduce the number of rows. All others are just discarded. The
rationale is that if a grouped path does not reduce the number of rows
enough, it is highly unlikely to result in a competitive final plan
during the upper planning stages, so it doesn't make much sense to
consider it. The current threshold is set to 50%, meaning that if the
number of rows returned by PartialAgg(t1 JOIN t2) is not less than 50%
of the rows returned by (t1 JOIN t2), no Aggregate paths will be
generated on top of the t1/t2 join. If we notice significant
regressions in planning time, we might consider further increasing
this threshold, say, to 80%, so that only grouped paths that reduce
the rows by more than 80% will be considered. This heuristic also
ensures that, once a plan with eager aggregation is chosen, it is
highly likely to result in performance improvements, due to the
significant data reduction before joins.
I took a look at the paper you linked in the original post, but
unfortunately it doesn't seem to say much about how to search the plan
space efficiently. I wonder if other systems perform a search that as
exhaustive as the one that you are proposing to perform here or
whether they apply some heuristics to limit the search space, and if
so, what those heuristics are.
Unfortunately, I don't have much knowledge about other systems. It
would be really helpful if anyone could share some insights on how
other systems handle this.
Thanks
Richard
On Tue, Jan 21, 2025 at 2:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
However, a partial-aggregation path does not generate the same data
as an unaggregated path, no matter how fuzzy you are willing to be
about the concept. So I'm having a very hard time accepting that
it ought to be part of the same RelOptInfo, and thus I don't really
buy that annotating paths with a GroupPathInfo is the way forward.
Agreed. I think one point I failed to make myself clear on is that
I've never intended to put a partial-aggregation path and an
unaggregated path into the same RelOptInfo. One of the basic designs
of this patch is that partial-aggregation paths are placed in a
separate category of RelOptInfos, which I call "grouped relations"
(though I admit that's not the best name). This ensures that we never
compare a partial-aggregation path with an unaggregated path during
scan/join planning, because I am certain that the two categories of
paths are not comparable.
Regarding the GroupPathInfo proposal, my intention is to add a valid
GroupPathInfo only for the partial-aggregation paths. The goal is to
ensure that partial-aggregation paths within this category are
compared only if their partial aggregations are at the same location.
To be honest, I still doubt that this is necessary. I have two main
reasons for this.
1.
For a partial-aggregation path, the location where we place the
partial aggregation does not impose any restrictions on further
planning. This is different from the parameterized path case. If two
parameterized paths are equal on very other figure of merit, we will
choose the one with fewer required outer rels, as it means fewer join
restrictions on upper planning. However, for partial-aggregation
paths, we do not have a preference regarding the location of the
partial aggregation. For instance, for path "A JOIN PartialAgg(B)
JOIN C" and path "PartialAgg(A JOIN B) JOIN C", if one path dominates
the other on every figure of merit, it seems to me that there's no
point in keeping the less favorable one, although they have their
partial aggregations at different join levels.
2.
A partial-aggregation path of a rel essentially yields an aggregated
form of that rel's row set. The difference between the row sets
yielded by paths with different locations of partial aggregation is
primarily about the different degrees to which the rows are
aggregated. These sets are fundamentally homogeneous.
In summary, in my own opinion, I think the partial-aggregation paths
of the same "grouped relation" are comparable, regardless of the
position of the partial aggregation within the path tree. So I think
we should put them into the same RelOptInfo.
Of course, I could be very wrong about this. I would greatly
appreciate hearing others' thoughts on this.
Thanks
Richard
On Tue, Jan 21, 2025 at 3:33 AM Richard Guo <guofenglinux@gmail.com> wrote:
I've been thinking about this proposal, and it's quite appealing. It
would significantly reduce both the planning effort and implementation
complexity, while still yielding reasonable planning results.One concern I have with this proposal is that, as we climb up higher
and higher in the join tree, the assumption that a path with smaller
row count and higher cost is better than one with larger row count and
lower cost may gradually no longer hold. It's true that a path with a
smaller row count is generally better for upper join nodes, as it
feeds fewer rows to upper join nodes. However, as there are fewer and
fewer upper join nodes left, the efficiency gained from the smaller
row count could likely no longer justify the high cost of that path
itself.Here's an example I found that can help illustrate what I mean.
Thanks for the example. What seems to be happening here is that each
of the three joins increases the number of rows by a multiple of
either 166 or 333. Aggregating reduces the number of rows to 3. I am
not sure that we should be too concerned about this kind of case,
because I don't think it will be common to have multiple joins that
dramatically increase the row count. If you did have that, you must
want to aggregate multiple times. We don't have the code for an
IntermediateAggregate or CombineAggregate node right now, I believe,
but in this query it would likely make sense to apply such a step
after every join; then you'd never have more than three rows.
Honestly, I'm not sure how much we should worry about a case like
this. I think that if a user is writing queries that use joins to
vastly inflate the row count and then aggregate the result, perhaps
they need to think about rewriting the queries. In this instance, it
feels a bit like the user is emulating multiplication using an
iterated SUM(), which is probably never going to work out all that
well.
But I bet it's possible to construct an example using only
row-reducing joins. Let's say we start with 10k rows that aggregate to
10 rows; after performing a join, we end up with 9k rows that
aggregate to 9 rows. So if we partially aggregate first, we have to
aggregate 1000 extra rows, but if we join first, we have to join 1000
extra rows. I don't think we can say a priori which will be cheaper,
but my idea would make the path that partially aggregates after the
join win unconditionally.
Yeah, you're right that the join search process for grouped paths
basically mirrors what we do for non-grouped paths, which indeed
involves a lot of planner effort. I've been exploring potential
heuristics to limit the search space for grouped paths, but so far, I
haven't found any effective solutions. Currently, the heuristic used
in the patch is to only consider grouped paths that dramatically
reduce the number of rows. All others are just discarded. The
rationale is that if a grouped path does not reduce the number of rows
enough, it is highly unlikely to result in a competitive final plan
during the upper planning stages, so it doesn't make much sense to
consider it. The current threshold is set to 50%, meaning that if the
number of rows returned by PartialAgg(t1 JOIN t2) is not less than 50%
of the rows returned by (t1 JOIN t2), no Aggregate paths will be
generated on top of the t1/t2 join. If we notice significant
regressions in planning time, we might consider further increasing
this threshold, say, to 80%, so that only grouped paths that reduce
the rows by more than 80% will be considered. This heuristic also
ensures that, once a plan with eager aggregation is chosen, it is
highly likely to result in performance improvements, due to the
significant data reduction before joins.
To be honest, I was quite surprised this was a percentage like 50% or
80% and not a multiple like 2 or 5. And I had thought the multiplier
might even be larger, like 10 or more. The thing is, 50% means we only
have to form 2-item groups in order to justify aggregating twice.
Maybe SUM() is cheap enough to justify that treatment, but a more
expensive aggregate might not be, especially things like string_agg()
or array_agg() where aggregation creates bigger objects.
Another thing to consider is that when the number of groups is small
enough that we don't need to do a Sort+GroupAggregate, it doesn't seem
so bad to perform marginally-useful partial aggregation, but sometimes
that won't be the case. For example, imagine that the user wants to
join orders to order_lines and then compute SUM(order_lines.quantity)
for each orders.customer_id. If the size of the order_lines tables is
large relative to work_mem, we're going to need to sort it in order
to partially aggregate, which is expensive. If it turns out that the
orders table is also quite big, then maybe we'll end up performing a
merge join and the same sort order can be used for both operations,
but if not, we could've just done a hash join with orders as the build
table. In that kind of case, partial aggregation has to save quite a
lot to justify itself.
Now, maybe we shouldn't worry about that when applying this heuristic
cutoff; after all, it's the job of the cost model to understand that
sorting is expensive, and this cutoff should just be there to make
sure we don't even try the cost model in cases where it's clearly
unpromising. But I do suspect that in queries where the average group
size is 2, this will often be a marginal technique. In addition to the
problems already mentioned, it could be that the average group size is
2 but a lot of groups are actually of size 1 and then there are some
larger groups. In such cases I'm even less sure that the partial
aggregation technique will be a winner. Building many 1-element groups
sounds inefficient.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Jan 22, 2025 at 1:36 AM Robert Haas <robertmhaas@gmail.com> wrote:
Thanks for the example. What seems to be happening here is that each
of the three joins increases the number of rows by a multiple of
either 166 or 333. Aggregating reduces the number of rows to 3. I am
not sure that we should be too concerned about this kind of case,
because I don't think it will be common to have multiple joins that
dramatically increase the row count. If you did have that, you must
want to aggregate multiple times. We don't have the code for an
IntermediateAggregate or CombineAggregate node right now, I believe,
but in this query it would likely make sense to apply such a step
after every join; then you'd never have more than three rows.
Haha, I did once think about the concept of multi-stage aggregations
while working on this patch. While testing this patch and trying to
figure out where placing the partial aggregation would bring the most
benefit, I noticed that a potentially effective approach could be
this: every time the row count increases to a certain point as we join
more and more tables, we perform one aggregation to deflate it, and
then wait for it to grow again before deflating it once more.
This approach would require injecting multiple intermediate
aggregation nodes into the path tree, for which we currently lack the
necessary architecture. As a result, I didn't pursue this idea
further. However, I'm really glad you mentioned this approach, though
it's still unclear whether it's a feasible or reasonable idea.
Honestly, I'm not sure how much we should worry about a case like
this. I think that if a user is writing queries that use joins to
vastly inflate the row count and then aggregate the result, perhaps
they need to think about rewriting the queries. In this instance, it
feels a bit like the user is emulating multiplication using an
iterated SUM(), which is probably never going to work out all that
well.
I don't have much experience with end-user scenarios, so I'm not sure
if it's common to have queries where the row count increases with more
and more tables joined.
But I bet it's possible to construct an example using only
row-reducing joins. Let's say we start with 10k rows that aggregate to
10 rows; after performing a join, we end up with 9k rows that
aggregate to 9 rows. So if we partially aggregate first, we have to
aggregate 1000 extra rows, but if we join first, we have to join 1000
extra rows. I don't think we can say a priori which will be cheaper,
but my idea would make the path that partially aggregates after the
join win unconditionally.
Yeah, this is the concern I raised upthread: the efficiency gained
from a path having a smaller row count may not always justify the high
cost of the path itself, especially as we move higher in the join
tree.
To be honest, I was quite surprised this was a percentage like 50% or
80% and not a multiple like 2 or 5. And I had thought the multiplier
might even be larger, like 10 or more. The thing is, 50% means we only
have to form 2-item groups in order to justify aggregating twice.
Maybe SUM() is cheap enough to justify that treatment, but a more
expensive aggregate might not be, especially things like string_agg()
or array_agg() where aggregation creates bigger objects.
Hmm, if I understand correctly, the "percentage" and the "multiple"
work in the same way. Percentage 50% and multiple 2 both mean that
the average group size is 2, and percentage 90% and multiple 10 both
mean that the average group size is 10. In general, this relationship
should hold: percentage = 1 - 1/multiple. However, I might not have
grasped your point correctly.
Another thing to consider is that when the number of groups is small
enough that we don't need to do a Sort+GroupAggregate, it doesn't seem
so bad to perform marginally-useful partial aggregation, but sometimes
that won't be the case. For example, imagine that the user wants to
join orders to order_lines and then compute SUM(order_lines.quantity)
for each orders.customer_id. If the size of the order_lines tables is
large relative to work_mem, we're going to need to sort it in order
to partially aggregate, which is expensive. If it turns out that the
orders table is also quite big, then maybe we'll end up performing a
merge join and the same sort order can be used for both operations,
but if not, we could've just done a hash join with orders as the build
table. In that kind of case, partial aggregation has to save quite a
lot to justify itself.Now, maybe we shouldn't worry about that when applying this heuristic
cutoff; after all, it's the job of the cost model to understand that
sorting is expensive, and this cutoff should just be there to make
sure we don't even try the cost model in cases where it's clearly
unpromising. But I do suspect that in queries where the average group
size is 2, this will often be a marginal technique. In addition to the
problems already mentioned, it could be that the average group size is
2 but a lot of groups are actually of size 1 and then there are some
larger groups. In such cases I'm even less sure that the partial
aggregation technique will be a winner. Building many 1-element groups
sounds inefficient.
Yeah, as you summarized, this heuristic is primarily used to discard
unpromising paths, ensuring they aren't considered further. For the
paths that pass this heuristic, the cost model will then determine the
appropriate aggregation and join methods. If we take this into
consideration when applying the heuristic, it seems to me that we
would essentially be duplicating the work that the cost model
performs, which doesn't seem necessary.
I think you are right that in cases where a lot of groups are actually
of size 1 and then there are some larger groups, the partial
aggregation may not be a win. Perhaps we can do better in this if we
have the techniques to estimate the distribution of data across
different groups or to predict how skewed the data might be. It seems
that we don't have such techniques at the moment. This also reminds
me of a similar challenge when calculating the startup cost of
incremental sort. I looked into cost_incremental_sort() and found
that we're currently using the average group size to estimate the
startup cost (please correct me if I'm wrong).
group_tuples = input_tuples / input_groups;
I think this may also suffer from data skew across different groups.
With the mentioned techniques, I believe we could improve the cost
estimation for incremental sort as well.
If I understand correctly, your main concern is the threshold being
set to 2, rather than the heuristic itself, right? Do you think
increasing this threshold to 10 or a larger value would help mitigate
the issue?
Thanks
Richard
On Wed, Jan 22, 2025 at 1:48 AM Richard Guo <guofenglinux@gmail.com> wrote:
This approach would require injecting multiple intermediate
aggregation nodes into the path tree, for which we currently lack the
necessary architecture. As a result, I didn't pursue this idea
further. However, I'm really glad you mentioned this approach, though
it's still unclear whether it's a feasible or reasonable idea.
I think the biggest question in my mind is really whether we can
accurately judge when such a strategy is likely to be a win. In this
instance it looks like we could have figured it out, but as we've
discussed, I fear a lot of estimates will be inaccurate. If we knew
they were going to be good, then I see no reason not to apply the
technique when it's sensible.
I don't have much experience with end-user scenarios, so I'm not sure
if it's common to have queries where the row count increases with more
and more tables joined.
I don't think it's very common to see it increase as dramatically as
in your test case.
To be honest, I was quite surprised this was a percentage like 50% or
80% and not a multiple like 2 or 5. And I had thought the multiplier
might even be larger, like 10 or more. The thing is, 50% means we only
have to form 2-item groups in order to justify aggregating twice.
Maybe SUM() is cheap enough to justify that treatment, but a more
expensive aggregate might not be, especially things like string_agg()
or array_agg() where aggregation creates bigger objects.Hmm, if I understand correctly, the "percentage" and the "multiple"
work in the same way. Percentage 50% and multiple 2 both mean that
the average group size is 2, and percentage 90% and multiple 10 both
mean that the average group size is 10. In general, this relationship
should hold: percentage = 1 - 1/multiple. However, I might not have
grasped your point correctly.
Yes, they're equivalent. However, a percentage to me suggests that we
think that the meaningful values might be something like 20%, 50%,
80%; whereas with a multiplier someone might be more inclined to think
of values like 10, 100, 1000. You can definitely write those values as
90%, 99%, 99.9%; however, it seems less natural to me to express it
that way when we think the value will be quite close to 1. The fact
that you chose a percentage suggested to me that you were aiming for a
less-strict threshold than I had supposed we would want.
Yeah, as you summarized, this heuristic is primarily used to discard
unpromising paths, ensuring they aren't considered further. For the
paths that pass this heuristic, the cost model will then determine the
appropriate aggregation and join methods. If we take this into
consideration when applying the heuristic, it seems to me that we
would essentially be duplicating the work that the cost model
performs, which doesn't seem necessary.
Well, I think we do ideally want heuristics that can reject
unpromising paths earlier. The planning cost of this is really quite
high. But I'm not sure how far we can get with this particular
heuristic. True, we could raise it to a larger value, and that might
help to rule out unpromising paths earlier. But I fear you'll quickly
find examples where it also rules out promising paths early. A good
heuristic is easy to compute and highly accurate. This heuristic is
easy to compute, but the accuracy is questionable.
--
Robert Haas
EDB: http://www.enterprisedb.com
I've switched back to this thread and will begin by working through
the key concerns that were previously raised.
The first concern is the lack of a proof demonstrating the correctness
of this transformation. To address this, I plan to include a detailed
proof in the README, along the lines of the following.
====== proof start ======
To prove that the transformation is correct, we partition the tables
in the FROM clause into two groups: those that contain at least one
aggregation column, and those that do not contain any aggregation
columns. Each group can be treated as a single relation formed by the
Cartesian product of the tables within that group. Therefore, without
loss of generality, we can assume that the FROM clause contains
exactly two relations, R1 and R2, where R1 represents the relation
containing all aggregation columns, and R2 represents the relation
without any aggregation columns.
Let the query be of the form:
SELECT G, AGG(A)
FROM R1 JOIN R2 ON J
GROUP BY G;
where G is the set of grouping keys that may include columns from R1
and/or R2; AGG(A) is an aggregate function over columns A from R1; J
is the join condition between R1 and R2.
The transformation of eager aggregation is:
GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
=
GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1)
JOIN R2 ON J)
This equivalence holds under the following conditions:
1) AGG is decomposable, meaning that it can be computed in two stages:
a partial aggregation followed by a final aggregation;
2) The set G1 used in the pre-aggregation of R1 includes:
* all columns from R1 that are part of the grouping keys G, and
* all columns from R1 that appear in the join condition J.
3) The grouping operator for any column in G1 must be compatible with
the operator used for that column in the join condition J.
Since G1 includes all columns from R1 that appear in either the
grouping keys G or the join condition J, all rows within each partial
group have identical values for both the grouping keys and the
join-relevant columns from R1, assuming compatible operators are used.
As a result, the rows within a partial group are indistinguishable in
terms of their contribution to the aggregation and their behavior in
the join. This ensures that all rows in the same partial group share
the same "destiny": they either all match or all fail to match a given
row in R2. Because the aggregate function AGG is decomposable,
aggregating the partial results after the join yields the same final
result as aggregating after the full join, thereby preserving query
semantics.
Q.E.D.
The second concern is that a RelOptInfo representing a grouped
relation may include paths that produce different row sets due to
partial aggregation being applied at different join levels. This
potentially violates a fundamental assumption in the planner.
Additionally, the patch currently performs an exhaustive search by
exploring partial aggregation at every possible join level, leading to
excessive planning effort, which may not be justified by the
cost-benefit ratio.
To address these concerns, I'm thinking that maybe we can adopt a
strategy where partial aggregation is only pushed to the lowest
possible level in the join tree that is deemed useful. In other
words, if we can build a grouped path like "AGG(B) JOIN A" -- and
AGG(B) yields a significant reduction in row count -- we skip
exploring alternatives like "AGG(A JOIN B)".
This is somewhat analogous to how we handle qual clauses: we only push
a qual clause down to the lowest scan or join level that includes all
the relations it references -- following the "filter early, join late"
principle. For example, if predicate Pb only references B, we only
consider "A JOIN sigma[Pb](B)" and skip "sigma[Pb](A JOIN B)". (Note
that if Pb involves costly functions and the join is highly selective,
we may want to apply the predicate after the join.)
This ensures that all grouped paths for the same grouped relation
produce the same set of rows (e.g., consider "A JOIN AGG(B) JOIN C"
vs. "AGG(B) JOIN C JOIN A"). As a result, we avoid the complexity of
comparing costs between different grouped paths of the same grouped
relation, and also eliminate the need for special handling of row
estimates on join paths. It also significantly reduces planning
effort.
While this approach may miss potentially more efficient plans where
applying partial aggregation at a higher join level would yield better
performance, it strikes a practical balance: we can still find plans
that outperform those without eager aggregation, without incurring
excessive planning overhead. As discussed earlier, it's uncommon in
practice to encounter multiple joins that dramatically inflate row
counts. So in most cases, pushing partial aggregation to the lowest
level where it offers a significant row count reduction tends to be
the most efficient strategy.
I think this heuristic serves as a good starting point, and we can
look into extending it with more advanced strategies as the feature
evolves.
Any thoughts?
Thanks
Richard
On Fri, Jun 13, 2025 at 4:41 PM Richard Guo <guofenglinux@gmail.com> wrote:
I've switched back to this thread and will begin by working through
the key concerns that were previously raised.The first concern is the lack of a proof demonstrating the correctness
of this transformation. To address this, I plan to include a detailed
proof in the README, along the lines of the following.
The second concern is that a RelOptInfo representing a grouped
relation may include paths that produce different row sets due to
partial aggregation being applied at different join levels. This
potentially violates a fundamental assumption in the planner.Additionally, the patch currently performs an exhaustive search by
exploring partial aggregation at every possible join level, leading to
excessive planning effort, which may not be justified by the
cost-benefit ratio.To address these concerns, I'm thinking that maybe we can adopt a
strategy where partial aggregation is only pushed to the lowest
possible level in the join tree that is deemed useful. In other
words, if we can build a grouped path like "AGG(B) JOIN A" -- and
AGG(B) yields a significant reduction in row count -- we skip
exploring alternatives like "AGG(A JOIN B)".
Here is the patch based on the proposed ideas. It includes the proof
of correctness in the README and implements the strategy of pushing
partial aggregation only to the lowest applicable join level where it
is deemed useful. This is done by introducing a "Relids apply_at"
field to track that level and ensuring that partial aggregation is
applied only at the recorded "apply_at" level.
Additionally, this patch changes how grouped relations are stored.
Since each grouped relation represents a partially aggregated version
of a non-grouped relation, we now associate each grouped relation with
the RelOptInfo of the corresponding non-grouped relation. This
eliminates the need for a dedicated list of all grouped relations and
avoids list searches when retrieving a grouped relation.
It also addresses other previously raised concerns, such as the
potential memory blowout risks with large partial-aggregation values,
and includes improvements to comments and the commit message.
Another change is that this feature is now enabled by default.
Thanks
Richard
Attachments:
v17-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v17-0001-Implement-Eager-Aggregation.patchDownload
From fcdd75d824bc9ee65078ad2dc7337cca22eccf50 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v17] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 15 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 443 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 313 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 636 ++++++++
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/tools/pgindent/typedefs.list | 3 +
23 files changed, 3588 insertions(+), 63 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2185b42bb4f..b9f767df05d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3692,30 +3692,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 59a0874528a..780b4a9fed1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5470,6 +5470,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..48a575c5bda 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1501,3 +1501,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..e75bb41b58d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +602,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1357,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3417,319 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3889,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3913,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4803,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..477b0bc3b84 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -35,6 +36,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -763,6 +767,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -874,6 +882,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01804b085b3..7fa1e5099b1 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool has_internal_aggtranstype(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +632,315 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate uses INTERNAL transition
+ * type.
+ *
+ * Although INTERNAL is marked as pass-by-value, it usually points to a
+ * large internal data structure (like those used by string_agg or
+ * array_agg). These transition states can grow large and their size is
+ * hard to estimate. Applying eager aggregation in such cases risks high
+ * memory usage since partial aggregation results might be stored in join
+ * hash tables or materialized nodes.
+ */
+ if (has_internal_aggtranstype(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * has_internal_aggtranstype
+ * Checks if any aggregate uses INTERNAL transition type.
+ */
+static bool
+has_internal_aggtranstype(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..6289902fc93 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -231,7 +231,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3982,9 +3981,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4066,23 +4063,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7027,16 +7017,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7149,7 +7165,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7167,7 +7183,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7175,7 +7191,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7217,19 +7233,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7269,6 +7283,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7279,6 +7294,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7300,11 +7324,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7529,6 +7555,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8092,13 +8163,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..26127eb07d1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2790,8 +2790,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3046,8 +3045,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3094,8 +3092,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3256,8 +3253,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..c4054b5d03f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,7 +89,22 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+/*
+ * Minimum average group size required to consider applying eager aggregation.
+ *
+ * This helps avoid the overhead of eager aggregation when it does not offer
+ * significant row count reduction.
+ */
+#define EAGER_AGG_MIN_GROUP_SIZE 20.0
/*
* setup_simple_rel_arrays
@@ -276,6 +297,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +429,104 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_unique_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -755,6 +876,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1062,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2643,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f04bfedb2fd..5a6a3b7406e 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -949,6 +949,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 341f88adc87..00eaf4869e0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..1b03b5f03cf 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -394,6 +394,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1022,6 +1031,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1095,6 +1112,75 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * EAGER_AGG_MIN_GROUP_SIZE, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3274,6 +3360,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..01a3532dc2e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -353,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..b62f22237b7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index a424be2a6bf..929cab14c47 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..61b7e6ea049 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1111,6 +1112,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2464,6 +2466,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.43.0
On Thu, Jun 26, 2025 at 11:01 AM Richard Guo <guofenglinux@gmail.com> wrote:
Here is the patch based on the proposed ideas. It includes the proof
of correctness in the README and implements the strategy of pushing
partial aggregation only to the lowest applicable join level where it
is deemed useful. This is done by introducing a "Relids apply_at"
field to track that level and ensuring that partial aggregation is
applied only at the recorded "apply_at" level.Additionally, this patch changes how grouped relations are stored.
Since each grouped relation represents a partially aggregated version
of a non-grouped relation, we now associate each grouped relation with
the RelOptInfo of the corresponding non-grouped relation. This
eliminates the need for a dedicated list of all grouped relations and
avoids list searches when retrieving a grouped relation.It also addresses other previously raised concerns, such as the
potential memory blowout risks with large partial-aggregation values,
and includes improvements to comments and the commit message.Another change is that this feature is now enabled by default.
This patch no longer applies; here's a rebased version. Nothing
essential has changed.
Thanks
Richard
Attachments:
v18-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v18-0001-Implement-Eager-Aggregation.patchDownload
From 23ab3a8c476e130a93b843c6afcba149641169fb Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v18] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 15 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 452 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 313 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 636 ++++++++
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/tools/pgindent/typedefs.list | 3 +
23 files changed, 3597 insertions(+), 63 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 4b6e49a5d95..8dea3dee667 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3713,30 +3713,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 20ccb2d6b54..395bca6cf95 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5474,6 +5474,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..48a575c5bda 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1501,3 +1501,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..ac922dbf56a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +602,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1357,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3417,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3898,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3922,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4812,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..477b0bc3b84 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -35,6 +36,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -763,6 +767,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -874,6 +882,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..3fbccc67190 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool has_internal_aggtranstype(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +632,315 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate uses INTERNAL transition
+ * type.
+ *
+ * Although INTERNAL is marked as pass-by-value, it usually points to a
+ * large internal data structure (like those used by string_agg or
+ * array_agg). These transition states can grow large and their size is
+ * hard to estimate. Applying eager aggregation in such cases risks high
+ * memory usage since partial aggregation results might be stored in join
+ * hash tables or materialized nodes.
+ */
+ if (has_internal_aggtranstype(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * has_internal_aggtranstype
+ * Checks if any aggregate uses INTERNAL transition type.
+ */
+static bool
+has_internal_aggtranstype(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c989e72cac5..6e1d01adbfa 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -231,7 +231,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3970,9 +3969,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4054,23 +4051,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7015,16 +7005,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7137,7 +7153,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7155,7 +7171,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7163,7 +7179,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7205,19 +7221,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7257,6 +7271,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7267,6 +7282,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7288,11 +7312,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7517,6 +7543,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8080,13 +8151,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9cc602788ea..71d1096012c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2813,8 +2813,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3069,8 +3068,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3117,8 +3115,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3279,8 +3276,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..c4054b5d03f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,7 +89,22 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+/*
+ * Minimum average group size required to consider applying eager aggregation.
+ *
+ * This helps avoid the overhead of eager aggregation when it does not offer
+ * significant row count reduction.
+ */
+#define EAGER_AGG_MIN_GROUP_SIZE 20.0
/*
* setup_simple_rel_arrays
@@ -276,6 +297,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +429,104 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_unique_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -755,6 +876,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1062,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2643,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..5ef8b824a7b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -949,6 +949,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..0eb755d61da 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e5dd15098f6..c9df12aa38e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1024,6 +1033,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1097,6 +1114,75 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * EAGER_AGG_MIN_GROUP_SIZE, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3276,6 +3362,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..01a3532dc2e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -353,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..b62f22237b7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8656419cb6..37053d9d769 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2471,6 +2473,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.43.0
On Thu, Jul 24, 2025 at 12:21 PM Richard Guo <guofenglinux@gmail.com> wrote:
This patch no longer applies; here's a rebased version. Nothing
essential has changed.
Based on some off-list testing by Matheus (CC'ed), several TPC-DS
queries that used to apply eager aggregation no longer do, which
suggests that the v18 patch is too strict about when eager aggregation
can be used.
I looked into query 4 and query 11, and found two reasons why they no
longer apply eager aggregation with v18.
* The has_internal_aggtranstype() check.
To avoid potential memory blowout risks from large partial aggregation
values, v18 avoids applying eager aggregation if any aggregate uses an
INTERNAL transition type, as this typically indicates a large internal
data structure (as in string_agg or array_agg). However, this also
excludes aggregates like avg(numeric) and sum(numeric), which are
actually safe to use with eager aggregation.
What we really want to exclude are aggregate functions that can
produce large transition values by accumulating or concatenating input
rows. So I'm wondering if we could instead check the transfn_oid
directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
jsonb_agg, or xmlagg, since they don't support partial aggregation
anyway.
* The EAGER_AGG_MIN_GROUP_SIZE threshold
This threshold defines the minimum average group size required to
consider applying eager aggregation. It was previously set to 2, but
in v18 it was increased to 20 to be cautious about planning overhead.
This change was a snap decision though, without any profiling or data
to back it.
Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
needed to consider eager aggregation for them. The resulting plans
show nice performance improvements without any measurable increase in
planning time. So, I'm inclined to lower the threshold to 10 for now.
(Wondering whether we should make this threshold a GUC, so users can
adjust it based on their needs.)
With these two changes, here are the planning and execution time for
queries 4 and 11 (scale factor 1) on my snail-paced machine, with and
without eager aggregation.
query 4:
-- without eager aggregation
Planning Time: 6.765 ms
Execution Time: 34941.713 ms
-- with eager aggregation
Planning Time: 6.674 ms
Execution Time: 13994.183 ms
query 11:
-- without eager aggregation
Planning Time: 3.757 ms
Execution Time: 20888.076 ms
-- with eager aggregation
Planning Time: 3.747 ms
Execution Time: 7449.522 ms
Any comments on these two changes?
Thanks
Richard
On Wed Aug 6, 2025 at 4:52 AM -03, Richard Guo wrote:
On Thu, Jul 24, 2025 at 12:21 PM Richard Guo <guofenglinux@gmail.com> wrote:
This patch no longer applies; here's a rebased version. Nothing
essential has changed.Based on some off-list testing by Matheus (CC'ed), several TPC-DS
queries that used to apply eager aggregation no longer do, which
suggests that the v18 patch is too strict about when eager aggregation
can be used.I looked into query 4 and query 11, and found two reasons why they no
longer apply eager aggregation with v18.* The has_internal_aggtranstype() check.
To avoid potential memory blowout risks from large partial aggregation
values, v18 avoids applying eager aggregation if any aggregate uses an
INTERNAL transition type, as this typically indicates a large internal
data structure (as in string_agg or array_agg). However, this also
excludes aggregates like avg(numeric) and sum(numeric), which are
actually safe to use with eager aggregation.What we really want to exclude are aggregate functions that can
produce large transition values by accumulating or concatenating input
rows. So I'm wondering if we could instead check the transfn_oid
directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
jsonb_agg, or xmlagg, since they don't support partial aggregation
anyway.
I think it makes sense to me. I just wondering if we should follow an
"allow" or "don't-allow" strategy. I mean, instead of a list aggregate
functions that are not allowed we could list functions that are actually
allowed to use eager aggregation, so in this case we ensure that for the
functions that are enabled the eager aggregation can work properly.
* The EAGER_AGG_MIN_GROUP_SIZE threshold
This threshold defines the minimum average group size required to
consider applying eager aggregation. It was previously set to 2, but
in v18 it was increased to 20 to be cautious about planning overhead.
This change was a snap decision though, without any profiling or data
to back it.Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
needed to consider eager aggregation for them. The resulting plans
show nice performance improvements without any measurable increase in
planning time. So, I'm inclined to lower the threshold to 10 for now.
(Wondering whether we should make this threshold a GUC, so users can
adjust it based on their needs.)
Having a GUC may sound like a good idea to me TBH. This threshold may
vary from workload to workload (?).
With these two changes, here are the planning and execution time for
queries 4 and 11 (scale factor 1) on my snail-paced machine, with and
without eager aggregation.query 4:
-- without eager aggregation
Planning Time: 6.765 ms
Execution Time: 34941.713 ms
-- with eager aggregation
Planning Time: 6.674 ms
Execution Time: 13994.183 msquery 11:
-- without eager aggregation
Planning Time: 3.757 ms
Execution Time: 20888.076 ms
-- with eager aggregation
Planning Time: 3.747 ms
Execution Time: 7449.522 msAny comments on these two changes?
It sounds like a good way to go for me, looking forward to the next
patch version to perform some other tests.
Thanks
--
Matheus Alcantara
On Wed, Aug 6, 2025 at 10:44 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
On Wed Aug 6, 2025 at 4:52 AM -03, Richard Guo wrote:
* The has_internal_aggtranstype() check.
To avoid potential memory blowout risks from large partial aggregation
values, v18 avoids applying eager aggregation if any aggregate uses an
INTERNAL transition type, as this typically indicates a large internal
data structure (as in string_agg or array_agg). However, this also
excludes aggregates like avg(numeric) and sum(numeric), which are
actually safe to use with eager aggregation.What we really want to exclude are aggregate functions that can
produce large transition values by accumulating or concatenating input
rows. So I'm wondering if we could instead check the transfn_oid
directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
jsonb_agg, or xmlagg, since they don't support partial aggregation
anyway.
I think it makes sense to me. I just wondering if we should follow an
"allow" or "don't-allow" strategy. I mean, instead of a list aggregate
functions that are not allowed we could list functions that are actually
allowed to use eager aggregation, so in this case we ensure that for the
functions that are enabled the eager aggregation can work properly.
I ended up still checking for INTERNAL transition types, but
explicitly excluded aggregates that use F_NUMERIC_AVG_ACCUM transition
function, assuming that avg(numeric) and sum(numeric) are safe in this
context. This might still be overly strict, but I prefer to be on the
safe side for now.
* The EAGER_AGG_MIN_GROUP_SIZE threshold
This threshold defines the minimum average group size required to
consider applying eager aggregation. It was previously set to 2, but
in v18 it was increased to 20 to be cautious about planning overhead.
This change was a snap decision though, without any profiling or data
to back it.Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
needed to consider eager aggregation for them. The resulting plans
show nice performance improvements without any measurable increase in
planning time. So, I'm inclined to lower the threshold to 10 for now.
(Wondering whether we should make this threshold a GUC, so users can
adjust it based on their needs.)
Having a GUC may sound like a good idea to me TBH. This threshold may
vary from workload to workload (?).
I've made this threshold a GUC, with a default value of 8 (further
benchmark testing showed that a value of 10 is still too strict for
TPC-DS query 4).
Any comments on these two changes?
It sounds like a good way to go for me, looking forward to the next
patch version to perform some other tests.
OK. Here it is.
Thanks
Richard
Attachments:
v19-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v19-0001-Implement-Eager-Aggregation.patchDownload
From 22999025da5f400b4b780df13dce008665c5c372 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v19] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 322 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 629 ++++++++
src/backend/utils/misc/guc_tables.c | 21 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 3658 insertions(+), 82 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index a434eb1395e..e05dcb44947 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3713,30 +3713,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 20ccb2d6b54..5400bd8f18f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5474,6 +5474,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6094,6 +6109,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..48a575c5bda 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1501,3 +1501,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..477b0bc3b84 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -35,6 +36,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -763,6 +767,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -874,6 +882,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..9cc8c558ccf 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,323 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with F_NUMERIC_AVG_ACCUM transition
+ * function from this check, based on the assumption that avg(numeric) and
+ * sum(numeric) are safe in this context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d59d6e4c6a0..d361319d0b5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -231,7 +231,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3971,9 +3970,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4055,23 +4052,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7016,16 +7006,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7138,7 +7154,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7156,7 +7172,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7164,7 +7180,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7206,19 +7222,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7258,6 +7272,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7268,6 +7283,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7289,11 +7313,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7518,6 +7544,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8081,13 +8152,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a4c5867cdcb..5a2e723bc29 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2818,8 +2818,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3074,8 +3073,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3122,8 +3120,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3284,8 +3281,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..bd28687dc81 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -276,6 +290,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +422,104 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_unique_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -755,6 +869,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1055,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2636,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..cdf8da02960 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -949,6 +949,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
@@ -3980,6 +3990,17 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"min_eager_agg_group_size", PGC_USERSET, QUERY_TUNING_COST,
+ gettext_noop("Sets the minimum average group size required to consider applying eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &min_eager_agg_group_size,
+ 8.0, 0.0, DBL_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"cursor_tuple_fraction", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Sets the planner's estimate of the fraction of "
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..e3cdfe11992 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ad2726f026f..a6175cbecaf 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1024,6 +1033,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1097,6 +1114,75 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3278,6 +3364,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 58936e963cb..cbdbc4978f6 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -353,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..9f6bad1faca 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4d5d35d0727..b764284d9c0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..052e6b7b920 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2472,6 +2474,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.43.0
On 08/08/25 22:32, Richard Guo wrote:
It sounds like a good way to go for me, looking forward to the next
patch version to perform some other tests.OK. Here it is.
Thanks! I can confirm now that I can see the eager aggregate in action
in some of these queries that I've tested on the TPC-DS benchmark.
I few questions regarding the new version:
I've noticed that when a query has a WHERE clause filtering columns from
the same relation being aggregated using "=" operator the Partial and
Finalize aggregation nodes are not present on explain results even if
setup_eager_aggregation() returns true on all if statements and also
RelAggInfo->agg_useful is true. For example, consider this query that is
used on eager aggregation paper that use some tables from TPC-H
benchmark:
tpch=# show enable_eager_aggregate ;
enable_eager_aggregate
------------------------
on
(1 row)
tpch=# set max_parallel_workers_per_gather to 0;
SET
tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS
FROM LINEITEM
JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY
WHERE L_RETURNFLAG = 'R'
GROUP BY O_CLERK;
QUERY PLAN
--------------------------------------------------------------
HashAggregate
Group Key: orders.o_clerk
-> Hash Join
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Seq Scan on lineitem
Filter: (l_returnflag = 'R'::bpchar)
-> Hash
-> Seq Scan on orders
(8 rows)
Debugging this query shows that all if conditions on
setup_eager_aggregation() returns false and create_agg_clause_infos()
and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
is also being set to true so I would expect to see Finalize and Partial
agg nodes, is this correct or am I missing something here?
Removing the WHERE clause I can see the Finalize and Partial agg nodes:
tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS
FROM LINEITEM
JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY
GROUP BY O_CLERK;
QUERY PLAN
----------------------------------------------------------------------
Finalize HashAggregate
Group Key: orders.o_clerk
-> Merge Join
Merge Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Partial GroupAggregate
Group Key: lineitem.l_orderkey
-> Index Scan using idx_lineitem_orderkey on lineitem
-> Index Scan using orders_pkey on orders
(8 rows)
This can also be reproduced with an addition of a WHERE clause on some
tests on eager_aggregate.sql:
postgres=# EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, avg(t2.c)
FROM eager_agg_t1 t1
JOIN eager_agg_t2 t2
ON t1.b = t2.b
WHERE t2.c = 5
GROUP BY t1.a
ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------------------
GroupAggregate
Output: t1.a, avg(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, t2.c
Sort Key: t1.a
-> Hash Join
Output: t1.a, t2.c
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.c, t2.b
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.c, t2.b
Filter: (t2.c = '5'::double precision)
(16 rows)
Note that if I use ">" operator for example, this doesn't happen:
SELECT t1.a, avg(t2.c)
FROM eager_agg_t1 t1
JOIN eager_agg_t2 t2
ON t1.b = t2.b
WHERE t2.c > 5
GROUP BY t1.a
ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------------------------
Finalize GroupAggregate
Output: t1.a, avg(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, (PARTIAL avg(t2.c))
Sort Key: t1.a
-> Hash Join
Output: t1.a, (PARTIAL avg(t2.c))
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.b, (PARTIAL avg(t2.c))
-> Partial HashAggregate
Output: t2.b, PARTIAL avg(t2.c)
Group Key: t2.b
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
Filter: (t2.c > '5'::double precision)
(19 rows)
Is this behavior correct? If it's correct, would be possible to check
this limitation on setup_eager_aggregation() and maybe skip all the
other work?
--
Matheus Alcantara
On Fri, Aug 15, 2025 at 4:22 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
Debugging this query shows that all if conditions on
setup_eager_aggregation() returns false and create_agg_clause_infos()
and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
is also being set to true so I would expect to see Finalize and Partial
agg nodes, is this correct or am I missing something here?
Well, just because eager aggregation *can* be applied does not mean
that it *will* be; it depends on whether it produces a lower-cost
execution plan. This transformation is cost-based, so it's not the
right mindset to assume that it will always be applied when possible.
In your case, with the filter "t2.c = 5", the row estimate for t2 is
just 1 after the filter has been applied. The planner decides that
adding a partial aggregation on top of such a small result set doesn't
offer much benefit, which seems reasonable to me.
-> Hash (cost=18.50..18.50 rows=1 width=12)
(actual time=0.864..0.865 rows=1.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=1 width=12)
(actual time=0.060..0.851
rows=1.00 loops=1)
Filter: (c = '5'::double precision)
Rows Removed by Filter: 999
With the filter "t2.c > 5", the row estimate for t2 is 995 after
filtering. A partial aggregation can reduce that to 10 rows, so the
planner decides that adding a partial aggregation is beneficial -- and
does so. That also seems reasonable to me.
-> Partial HashAggregate (cost=23.48..23.58 rows=10 width=36)
(actual time=2.427..2.438 rows=10.00 loops=1)
Group Key: t2.b
Batches: 1 Memory Usage: 32kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=995 width=12)
(actual time=0.053..0.989
rows=995.00 loops=1)
Filter: (c > '5'::double precision)
Rows Removed by Filter: 5
Is this behavior correct? If it's correct, would be possible to check
this limitation on setup_eager_aggregation() and maybe skip all the
other work?
Hmm, I wouldn't consider this a limitation; it's just the result of
the planner's cost-based tournament for path selection.
Thanks
Richard
On Sat, Aug 9, 2025 at 10:32 AM Richard Guo <guofenglinux@gmail.com> wrote:
OK. Here it is.
This patch needs a rebase; here it is. No changes were made.
- Richard
Attachments:
v20-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v20-0001-Implement-Eager-Aggregation.patchDownload
From 63378cda1912f8bca3455e374638ba02ce1ad651 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v20] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 322 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/relnode.c | 628 ++++++++
src/backend/utils/misc/guc_tables.c | 21 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 3653 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 78b8367d289..b6c892bdb51 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0a4b3e55ba5..aab91625daf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..5af3ced5750 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..04cbbcea2a4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..9cc8c558ccf 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,323 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with F_NUMERIC_AVG_ACCUM transition
+ * function from this check, based on the assumption that avg(numeric) and
+ * sum(numeric) are safe in this context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..faa44e46594 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f137129209f..d3bfcaf0784 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -965,6 +965,16 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
@@ -4050,6 +4060,17 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"min_eager_agg_group_size", PGC_USERSET, QUERY_TUNING_COST,
+ gettext_noop("Sets the minimum average group size required to consider applying eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &min_eager_agg_group_size,
+ 8.0, 0.0, DBL_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"cursor_tuple_fraction", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Sets the planner's estimate of the fraction of "
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..e3cdfe11992 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..ad211207343 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1046,6 +1055,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1130,6 +1147,75 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3283,6 +3369,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..5b9c1daf14b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 04079268b98..d0bb66f43da 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..9a4567db01a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
On Mon, Sep 1, 2025 at 10:32 AM Richard Guo <guofenglinux@gmail.com> wrote:
This patch needs a rebase; here it is. No changes were made.
Here is a rebase after the GUC tables change.
- Richard
Attachments:
v21-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v21-0001-Implement-Eager-Aggregation.patchDownload
From 3f839b71eb76f9e662f0768ad2aff600d500748f Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v21] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 322 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/relnode.c | 628 ++++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 3648 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 78b8367d289..b6c892bdb51 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0a4b3e55ba5..aab91625daf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..5af3ced5750 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..04cbbcea2a4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..9cc8c558ccf 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,323 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with F_NUMERIC_AVG_ACCUM transition
+ * function from this check, based on the assumption that avg(numeric) and
+ * sum(numeric) are safe in this context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..faa44e46594 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index a157cec3c4d..466aabb8cf0 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2421,6 +2428,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..e3cdfe11992 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..ad211207343 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1046,6 +1055,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1130,6 +1147,75 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3283,6 +3369,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..5b9c1daf14b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 04079268b98..d0bb66f43da 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..9a4567db01a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
Sorry for the slow response.
On Fri, Jun 13, 2025 at 3:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
The transformation of eager aggregation is:
GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
=
GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1)
JOIN R2 ON J)This equivalence holds under the following conditions:
1) AGG is decomposable, meaning that it can be computed in two stages:
a partial aggregation followed by a final aggregation;
2) The set G1 used in the pre-aggregation of R1 includes:
* all columns from R1 that are part of the grouping keys G, and
* all columns from R1 that appear in the join condition J.
3) The grouping operator for any column in G1 must be compatible with
the operator used for that column in the join condition J.
This proof seems to ignore join-order constraints. I'm not sure to
what degree that influences the ultimate outcome here, but given A
LEFT JOIN (B INNER JOIN C), we cannot simply decide that A and C
comprise R1 and B comprises R2, because it is not actually possible to
do the A-C join first and treat the result as a relation to be joined
to B. That said, I do very much like the explicit enumeration of
criteria that must be met for the optimization to be valid. That makes
it a lot easier to evaluate whether the theory of the patch is
correct.
To address these concerns, I'm thinking that maybe we can adopt a
strategy where partial aggregation is only pushed to the lowest
possible level in the join tree that is deemed useful. In other
words, if we can build a grouped path like "AGG(B) JOIN A" -- and
AGG(B) yields a significant reduction in row count -- we skip
exploring alternatives like "AGG(A JOIN B)".
I really like this idea. I believe we need some heuristic here and
this seems like a reasonable one. I think there could be a better one,
potentially. For instance, it would be reasonable (in my opinion) to
do some kind of evaluation of AGG(A JOIN B) vs. AGG(B) JOIN A that
does not involve performing full path generation for both cases; e.g.
one could try to decide considering only row counts, for instance.
However, I'm not saying that would work better than your proposal
here, or that it should be a requirement for this to be committed;
it's just an idea. IMHO, the requirement to have something committable
is that there is SOME heuristic limiting the search space and at the
same time the patch can still be demonstrated to give SOME benefit. I
think what you propose here meets those criteria. I also like the fact
that it's simple and easy to understand. If it does go wrong, it will
not be too difficult for someone to understand why it has gone wrong,
which is very desirable.
I think this heuristic serves as a good starting point, and we can
look into extending it with more advanced strategies as the feature
evolves.
So IOW, +1 to what you say here.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
To avoid potential memory blowout risks from large partial aggregation
values, v18 avoids applying eager aggregation if any aggregate uses an
INTERNAL transition type, as this typically indicates a large internal
data structure (as in string_agg or array_agg). However, this also
excludes aggregates like avg(numeric) and sum(numeric), which are
actually safe to use with eager aggregation.What we really want to exclude are aggregate functions that can
produce large transition values by accumulating or concatenating input
rows. So I'm wondering if we could instead check the transfn_oid
directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
jsonb_agg, or xmlagg, since they don't support partial aggregation
anyway.
This strategy seems fairly unfriendly towards out-of-core code. Can
you come up with something that allows the author of a SQL-callable
function to include or exclude the function by a choice that is under
their control, rather than hard-coding something in PostgreSQL itself?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Sep 5, 2025 at 3:35 AM Richard Guo <guofenglinux@gmail.com> wrote:
Here is a rebase after the GUC tables change.
I spent a bit of time scrolling through this today. Here are a few
observations/review comments.
It looks as though this will create a bunch of RelOptInfo objects that
don't end up getting used for anything once the apply_at test in
generate_grouped_paths() fails. It seems to me that it would be better
to altogether avoid generating the RelOptInfo in that case.
I think it would be worth considering generating the partially grouped
relations in a second pass. Right now, as you progress from the bottom
of the join tree towards the top, you created grouped rels as you go.
But you could equally well finish planning everything up to the
scan/join target first and then go back and add grouped_rels to
relations where it seems worthwhile. I don't know if this would really
make a big difference as you have things today, but I think it might
provided a better structure for the future, because you would then
have a lot more information with which to judge where to do
aggregation. For instance, you could looked at the row counts of any
number of those ungrouped-rels before deciding where to put the
partial aggregation. That seems like it could be pretty valuable.
I haven't done a detailed comparison of generate_grouped_paths() to
other parts of the code, but I have an uncomfortable feeling that it
might be rather similar to some existing code that probably already
exists in multiple, slightly-different versions. Is there any
refactoring we could do here?
Do you need a test of this feature in combination with GEQO? You have
code for it but I don't immediately see a test. I didn't check
carefully, though.
Overall I like the direction this is heading. I don't feel
well-qualified to evaluate whether all of the things that you're doing
are completely safe. The logic in is_var_in_aggref_only() and
is_var_needed_by_join() scares me a bit because I worry that the
checks are somehow non-exhaustive, but I don't know of a specific
hazard. That said, I think that modulo such issues, this has a good
chance of significantly improving performance for certain query
shapes.
One thing to check might be whether you can construct any cases where
the strategy is applied too boldly. Given the safeguards you've put in
place that seems a little a little hard to construct. The most obvious
thing that occurs to me is an aggregate where combining is more
expensive than aggregating, so that the partial aggregation gives the
appearance of saving more work than it really does, but I can't
immediately think of a problem case. Another case could be where the
row counts are off, leading to us mistakenly believing that we're
going to reduce the number of rows that need to be processed when we
really don't. Of course, such a case would arguably be a fault of the
bad row-count estimate rather than this patch, but if the patch has
that problem frequently, it might need to be addressed. Still, I have
a feeling that the testing you've already been doing might have
surfaced such cases if they were common. Have you looked into how many
queries in the regression tests, or in TPC-H/DS, expend significant
planning effort on this strategy before discarding it? That might be a
good way to get a sense of whether the patch is too aggressive, not
aggressive enough, a mix of the two, or just right.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
needed to consider eager aggregation for them. The resulting plans
show nice performance improvements without any measurable increase in
planning time. So, I'm inclined to lower the threshold to 10 for now.
(Wondering whether we should make this threshold a GUC, so users can
adjust it based on their needs.)
Like Matheus, I think a GUC is reasonable. A significant danger here
appears to be the possibility of a performance cliff, where queries
are optimized very different when the ratio is 9.99 vs. 10.01, say. It
would be nice if there were some way to mitigate that danger, but at
least a GUC avoids chaining the performance of the whole system to a
hard-coded value.
It might be worth considering whether there are heuristics other than
the group size that could help here. Possibly that's just making
things more complicated to no benefit. It seems to me, for example,
that reducing 100 rows to 10 is quite different from reducing a
million rows to 100,000. On the whole, the latter seems more likely to
work out well, but it's tricky, because the effort expended per group
can be arbitrarily high. I think we do want to let the cost model make
most of the decisions, and just use this threshold to prune ideas that
are obviously bad at an early stage. That said, it's worth thinking
about how this interacts with the just-considered-one-eager-agg
strategy. Does this threshold apply before or after that rule?
For instance, consider AGG(FACT_TABLE JOIN DIMENSION_TABLE), like a
count of orders grouped by customer name. Aggregating on the dimension
table (in this case, the list of customers) is probably useless, but
aggregating on the join column of the fact table has a good chance of
being useful. If we consider only one of those strategies, we want it
to be the right one. This threshold could be the thing that helps us
to get it right.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Sep 5, 2025 at 10:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jun 13, 2025 at 3:42 AM Richard Guo <guofenglinux@gmail.com> wrote:
The transformation of eager aggregation is:
GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
=
GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1)
JOIN R2 ON J)This equivalence holds under the following conditions:
1) AGG is decomposable, meaning that it can be computed in two stages:
a partial aggregation followed by a final aggregation;
2) The set G1 used in the pre-aggregation of R1 includes:
* all columns from R1 that are part of the grouping keys G, and
* all columns from R1 that appear in the join condition J.
3) The grouping operator for any column in G1 must be compatible with
the operator used for that column in the join condition J.
This proof seems to ignore join-order constraints. I'm not sure to
what degree that influences the ultimate outcome here, but given A
LEFT JOIN (B INNER JOIN C), we cannot simply decide that A and C
comprise R1 and B comprises R2, because it is not actually possible to
do the A-C join first and treat the result as a relation to be joined
to B. That said, I do very much like the explicit enumeration of
criteria that must be met for the optimization to be valid. That makes
it a lot easier to evaluate whether the theory of the patch is
correct.
Thanks for pointing this out. I should have clarified that the proof
is intended for the inner join case. My plan was to first establish
the correctness for inner joins, and then extend the proof to cover
outer joins, but I failed to make that clear.
In the case where there are any outer joins, the situation becomes
more complex due to join order constraints and the semantics of
null-extension in outer joins. If the relations that contain at least
one aggregation column cannot be treated as a single relation because
of the join order constraints, partial aggregation paths will not be
generated, and thus the transformation is not applicable.
Otherwise, to preserve correctness, we need to add an additional
condition: R1 must not be on the nullable side of any outer join.
This ensures that partial aggregation over R1 does not suppress any
null-extended rows that would be introduced by outer joins.
I'll update the proof in README to cover the outer join case.
To address these concerns, I'm thinking that maybe we can adopt a
strategy where partial aggregation is only pushed to the lowest
possible level in the join tree that is deemed useful. In other
words, if we can build a grouped path like "AGG(B) JOIN A" -- and
AGG(B) yields a significant reduction in row count -- we skip
exploring alternatives like "AGG(A JOIN B)".
I really like this idea. I believe we need some heuristic here and
this seems like a reasonable one. I think there could be a better one,
potentially. For instance, it would be reasonable (in my opinion) to
do some kind of evaluation of AGG(A JOIN B) vs. AGG(B) JOIN A that
does not involve performing full path generation for both cases; e.g.
one could try to decide considering only row counts, for instance.
However, I'm not saying that would work better than your proposal
here, or that it should be a requirement for this to be committed;
it's just an idea. IMHO, the requirement to have something committable
is that there is SOME heuristic limiting the search space and at the
same time the patch can still be demonstrated to give SOME benefit. I
think what you propose here meets those criteria. I also like the fact
that it's simple and easy to understand. If it does go wrong, it will
not be too difficult for someone to understand why it has gone wrong,
which is very desirable.
I think this heuristic serves as a good starting point, and we can
look into extending it with more advanced strategies as the feature
evolves.
So IOW, +1 to what you say here.
Thanks for liking this idea. Another way this heuristic makes life
easier is that it ensures all grouped paths for the same grouped
relation produce the same set of rows. This means we don't need all
the hacks for comparing costs between grouped paths, nor do we have to
resolve disputes about how many RelOptInfos to create for a single
grouped relation. I'd prefer to keep this property for now and
explore more complex heuristics in the future.
- Richard
On Fri, Sep 5, 2025 at 10:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <guofenglinux@gmail.com> wrote:
What we really want to exclude are aggregate functions that can
produce large transition values by accumulating or concatenating input
rows. So I'm wondering if we could instead check the transfn_oid
directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
jsonb_agg, or xmlagg, since they don't support partial aggregation
anyway.
This strategy seems fairly unfriendly towards out-of-core code. Can
you come up with something that allows the author of a SQL-callable
function to include or exclude the function by a choice that is under
their control, rather than hard-coding something in PostgreSQL itself?
Yeah, ideally we should tell whether an aggregate's transition state
may grow unbounded just by looking at system catalogs. Unfortunately,
after trying for a while, it seems to me that the current catalog
doesn't provide enough information.
I once considered adding a flag (e.g., aggtransbounded) to catalog
pg_aggregate to indicate whether the transition state size is bounded.
This flag could be specified by users when creating aggregate
functions, and then leveraged by features such as eager aggregation.
However, adding new information to system catalogs involves a lot of
discussions and changes, including updates to DDL commands, dump and
restore processes, and upgrade procedures. Therefore, to keep the
focus of this patch on the eager aggregation feature itself, I prefer
to treat this enhancement as future work.
- Richard
On Fri, Sep 5, 2025 at 11:37 PM Robert Haas <robertmhaas@gmail.com> wrote:
I spent a bit of time scrolling through this today. Here are a few
observations/review comments.
Thanks for all the comments.
It looks as though this will create a bunch of RelOptInfo objects that
don't end up getting used for anything once the apply_at test in
generate_grouped_paths() fails. It seems to me that it would be better
to altogether avoid generating the RelOptInfo in that case.
Hmm, that's not the case. make_grouped_join_rel() guarantees that for
a given relation, if its grouped paths are not considered useful, and
no grouped paths can be built by joining grouped input relations, then
its grouped relation will not be created. IOW, we only create a
grouped RelOptInfo if we've determined that we can generate useful
grouped paths for it.
In the case you mentioned, where the apply_at test in
generate_grouped_paths() fails, it must mean that grouped paths can be
built by joining its outer and inner relations. Also, note that calls
to generate_grouped_paths() are always followed by calls to
set_cheapest(). If we failed to generate any grouped paths for a
grouped relation, the set_cheapest() call should already have reported
an error.
I think it would be worth considering generating the partially grouped
relations in a second pass. Right now, as you progress from the bottom
of the join tree towards the top, you created grouped rels as you go.
But you could equally well finish planning everything up to the
scan/join target first and then go back and add grouped_rels to
relations where it seems worthwhile.
Hmm, I don't think so. I think the presence of eager aggregation
could change the best join order. For example, without eager
aggregation, the optimizer might find that (A JOIN B) JOIN C the best
join order. But with eager aggregation on B, the optimizer could
prefer A JOIN (AGG(B) JOIN C). I'm not sure how we could find the
best join order with eager aggregation applied without building the
join tree from the bottom up.
I haven't done a detailed comparison of generate_grouped_paths() to
other parts of the code, but I have an uncomfortable feeling that it
might be rather similar to some existing code that probably already
exists in multiple, slightly-different versions. Is there any
refactoring we could do here?
Yeah, we currently have several functions that do similar, but not
exactly the same, things. Maybe some refactoring is possible -- maybe
not -- I haven't looked into it closely yet. However, I'd prefer to
address that in a separate patch if possible, since this issue also
exists on master, and I want to avoid introducing such changes in this
already large patch.
Do you need a test of this feature in combination with GEQO? You have
code for it but I don't immediately see a test. I didn't check
carefully, though.
Good point. I do have manually tested GEQO by setting geqo_threshold
to 2 and running the regression tests to check for any planning
errors, crashes, or incorrect results. However, I'm not sure where
test cases for GEQO should be added. I searched the regression tests
and found only one explicit GEQO test, added back in 2009 (commit
a43b190e3). It's not quite clear to me what the current policy is for
adding GEQO test cases.
Anyway, I will add some test cases in eager_aggregate.sql with
geqo_threshold set to 2.
Overall I like the direction this is heading. I don't feel
well-qualified to evaluate whether all of the things that you're doing
are completely safe. The logic in is_var_in_aggref_only() and
is_var_needed_by_join() scares me a bit because I worry that the
checks are somehow non-exhaustive, but I don't know of a specific
hazard. That said, I think that modulo such issues, this has a good
chance of significantly improving performance for certain query
shapes.One thing to check might be whether you can construct any cases where
the strategy is applied too boldly. Given the safeguards you've put in
place that seems a little a little hard to construct. The most obvious
thing that occurs to me is an aggregate where combining is more
expensive than aggregating, so that the partial aggregation gives the
appearance of saving more work than it really does, but I can't
immediately think of a problem case. Another case could be where the
row counts are off, leading to us mistakenly believing that we're
going to reduce the number of rows that need to be processed when we
really don't. Of course, such a case would arguably be a fault of the
bad row-count estimate rather than this patch, but if the patch has
that problem frequently, it might need to be addressed. Still, I have
a feeling that the testing you've already been doing might have
surfaced such cases if they were common. Have you looked into how many
queries in the regression tests, or in TPC-H/DS, expend significant
planning effort on this strategy before discarding it? That might be a
good way to get a sense of whether the patch is too aggressive, not
aggressive enough, a mix of the two, or just right.
I previously looked into the TPC-DS queries where eager aggregation
was applied and didn't observe any regressions in planning time or
execution time. I can run TPC-DS again to check the planning time for
the remaining queries.
- Richard
On Fri, Sep 5, 2025 at 11:50 PM Robert Haas <robertmhaas@gmail.com> wrote:
Like Matheus, I think a GUC is reasonable. A significant danger here
appears to be the possibility of a performance cliff, where queries
are optimized very different when the ratio is 9.99 vs. 10.01, say. It
would be nice if there were some way to mitigate that danger, but at
least a GUC avoids chaining the performance of the whole system to a
hard-coded value.
Yeah, I think the performance cliff issue does exist. It might be
mitigated by carefully selecting the threshold value to ensure that
small differences in the average group size near the boundary don't
cause big performance swings with and without eager aggregation, but
this doesn't seem like an easy task.
How is this issue avoided in other thresholds? For example, with
min_parallel_table_scan_size, is there a performance cliff when the
table size is 7.99MB vs. 8.01MB, where a parallel scan is considered
in the latter case but not the former?
It might be worth considering whether there are heuristics other than
the group size that could help here. Possibly that's just making
things more complicated to no benefit. It seems to me, for example,
that reducing 100 rows to 10 is quite different from reducing a
million rows to 100,000. On the whole, the latter seems more likely to
work out well, but it's tricky, because the effort expended per group
can be arbitrarily high. I think we do want to let the cost model make
most of the decisions, and just use this threshold to prune ideas that
are obviously bad at an early stage. That said, it's worth thinking
about how this interacts with the just-considered-one-eager-agg
strategy. Does this threshold apply before or after that rule?
If I understand correctly, this means that we need to explore each
join level to find out the most optimal position for applying partial
aggregation. For example, suppose Agg(B) reduces 100 rows to 10, and
Agg(A JOIN B) reduces a million rows to 100,000, it might be better to
apply partial aggregation at the (A JOIN B) level rather than just
over B. However, that's not always the case: the Agg(B) option can
reduce the number of input rows to the join earlier, potentially
outperforming the Agg(A JOIN B) approach. Therefore, we need to
consider both options and compare their costs.
This is actually what the patch used to do before I introduced the
always-push-to-lowest heuristic.
For instance, consider AGG(FACT_TABLE JOIN DIMENSION_TABLE), like a
count of orders grouped by customer name. Aggregating on the dimension
table (in this case, the list of customers) is probably useless, but
aggregating on the join column of the fact table has a good chance of
being useful. If we consider only one of those strategies, we want it
to be the right one. This threshold could be the thing that helps us
to get it right.
Now I see what you meant. However, in the current implementation, we
only push partial aggregation down to relations that contain all the
aggregation columns. So, in the case you mentioned, if the
aggregation columns come from the dimension table, unfortunately, we
don't have the option to partially aggregate the fact table.
The paper does discuss several other transformations, such as "Eager
Count", "Double Eager", and "Eager Split", that can perform partial
aggregation on relations that don't contain aggregation columns, or
even on both sides of the join. However, those are beyond the scope
of this patch.
- Richard
On Tue, Sep 9, 2025 at 5:20 AM Richard Guo <guofenglinux@gmail.com> wrote:
Yeah, ideally we should tell whether an aggregate's transition state
may grow unbounded just by looking at system catalogs. Unfortunately,
after trying for a while, it seems to me that the current catalog
doesn't provide enough information.I once considered adding a flag (e.g., aggtransbounded) to catalog
pg_aggregate to indicate whether the transition state size is bounded.
This flag could be specified by users when creating aggregate
functions, and then leveraged by features such as eager aggregation.However, adding new information to system catalogs involves a lot of
discussions and changes, including updates to DDL commands, dump and
restore processes, and upgrade procedures. Therefore, to keep the
focus of this patch on the eager aggregation feature itself, I prefer
to treat this enhancement as future work.
I don't really like that. I think there's a lot of danger of that
future work never getting done, and thus leaving us stuck more-or-less
permanently with a system that's not really extensible. Data type and
function extensibility is one of the strongest areas of PostgreSQL,
and we should try hard to avoid situations where we regress it. I'm
not sure whether the aggtransbounded flag is exactly the right thing
here, but I don't think adding a new catalog column is an unreasonable
amount of work for a feature of this type.
Having said that, I wonder whether there's some way that we could use
the aggtransspace property for this. For instance, for stanullfrac, we
use values >0 to mean absolute quantities and values <0 to mean
proportions. The current definition of aggtranspace assigns no meaning
to values <0, and the current coding seems to assume that sizes are
fixed regardless of how many inputs are supplied. Maybe we could
define aggtransspace<0 to mean that the number of bytes used per input
value is the additive inverse of the value, or something like that.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 9, 2025 at 6:30 AM Richard Guo <guofenglinux@gmail.com> wrote:
I think it would be worth considering generating the partially grouped
relations in a second pass. Right now, as you progress from the bottom
of the join tree towards the top, you created grouped rels as you go.
But you could equally well finish planning everything up to the
scan/join target first and then go back and add grouped_rels to
relations where it seems worthwhile.Hmm, I don't think so. I think the presence of eager aggregation
could change the best join order. For example, without eager
aggregation, the optimizer might find that (A JOIN B) JOIN C the best
join order. But with eager aggregation on B, the optimizer could
prefer A JOIN (AGG(B) JOIN C). I'm not sure how we could find the
best join order with eager aggregation applied without building the
join tree from the bottom up.
Oh, that is a problem, yes. :-(
I haven't done a detailed comparison of generate_grouped_paths() to
other parts of the code, but I have an uncomfortable feeling that it
might be rather similar to some existing code that probably already
exists in multiple, slightly-different versions. Is there any
refactoring we could do here?Yeah, we currently have several functions that do similar, but not
exactly the same, things. Maybe some refactoring is possible -- maybe
not -- I haven't looked into it closely yet. However, I'd prefer to
address that in a separate patch if possible, since this issue also
exists on master, and I want to avoid introducing such changes in this
already large patch.
Well, it's not just a matter of "this already exists" -- it gets
harder and harder to unify things the more near-copies you add.
Good point. I do have manually tested GEQO by setting geqo_threshold
to 2 and running the regression tests to check for any planning
errors, crashes, or incorrect results. However, I'm not sure where
test cases for GEQO should be added. I searched the regression tests
and found only one explicit GEQO test, added back in 2009 (commit
a43b190e3). It's not quite clear to me what the current policy is for
adding GEQO test cases.Anyway, I will add some test cases in eager_aggregate.sql with
geqo_threshold set to 2.
Sounds good. I think GEQO is mostly-unmaintained these days, but if
we're updating the code, I think it is good to add tests. Being that
the code is so old, it probably lacks adequate test coverage.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 9, 2025 at 11:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
Having said that, I wonder whether there's some way that we could use
the aggtransspace property for this. For instance, for stanullfrac, we
use values >0 to mean absolute quantities and values <0 to mean
proportions. The current definition of aggtranspace assigns no meaning
to values <0, and the current coding seems to assume that sizes are
fixed regardless of how many inputs are supplied. Maybe we could
define aggtransspace<0 to mean that the number of bytes used per input
value is the additive inverse of the value, or something like that.
I really like this idea. Currently, aggtransspace represents an
estimate of the transition state size provided by the aggregate
definition. If it's set to zero, a default estimate based on the
state data type is used. Negative values currently have no defined
meaning. I think it makes perfect sense to reuse this field so that
a negative value indicates that the transition state data can grow
unboundedly in size.
Attached 0002 implements this idea. It requires fewer code changes
than I expected. This is mainly because that our current code uses
aggtransspace in such a way that if it's a positive value, that value
is used as it's provided by the aggregate definition; otherwise, some
heuristics are applied to estimate the size. For the aggregates that
accumulate input rows (e.g., array_agg, string_agg), I don't currently
have a better heuristic for estimating their size, so I've chosen to
keep the current logic. This won't regress anything in estimating
transition state data size.
- Richard
Attachments:
v22-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v22-0001-Implement-Eager-Aggregation.patchDownload
From 8a780d897ec5205a48867f3dc291edf80707aca3 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v22 1/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 110 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 +++++
src/backend/optimizer/path/joinrels.c | 193 ++
src/backend/optimizer/plan/initsplan.c | 323 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/relnode.c | 628 +++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1584 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 225 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 3951 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 18d727d7790..f1b2d684e35 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2a3685f474a..bac3c3270a0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..6c35baceedb 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,113 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, let's first consider the
+case where only inner joins are involved. In this case, we partition
+the tables in the FROM clause into two groups: those that contain at
+least one aggregation column, and those that do not contain any
+aggregation columns. Each group can be treated as a single relation
+formed by the Cartesian product of the tables within that group.
+Therefore, without loss of generality, we can assume that the FROM
+clause contains exactly two relations, R1 and R2, where R1 represents
+the relation containing all aggregation columns, and R2 represents the
+relation without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+In the case where there are any outer joins, the situation becomes
+more complex due to join order constraints and the semantics of
+null-extension in outer joins. If the relations that contain at least
+one aggregation column cannot be treated as a single relation because
+of the join order constraints, partial aggregation paths will not be
+generated, and thus the transformation is not applicable. Otherwise,
+let R1 be the relation containing all aggregation columns, and R2, R3,
+... be the remaining relations. From the inner join case, under the
+aforementioned conditions, we have the equivalence:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 JOIN R3 ...)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 JOIN R3 ...)
+
+To preserve correctness when outer joins are involved, we require an
+additional condition:
+
+4) R1 must not be on the nullable side of any outer join.
+
+This condition ensures that partial aggregation over R1 does not
+suppress any null-extended rows that would be introduced by outer
+joins. If R1 is on the nullable side of an outer join, the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..04cbbcea2a4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..1b778f692d4 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,324 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with AVG_ACCUM transition function from
+ * this check, based on the assumption that avg() and sum() are safe in this
+ * context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
+ transinfo->transfn_oid == F_INT8_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..faa44e46594 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0da01627cfe..f35dd1b23bf 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2427,6 +2434,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 26c08693564..7325bcd439d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..ad211207343 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1046,6 +1055,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1130,6 +1147,75 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3283,6 +3369,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..5b9c1daf14b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..0dab585e9ce
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1584 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+--
+-- Test eager aggregation with GEQO
+--
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 04079268b98..d0bb66f43da 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..8b1049ae3f3
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,225 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+--
+-- Test eager aggregation with GEQO
+--
+
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..9a4567db01a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
v22-0002-Allow-negative-aggtransspace-to-indicate-unbound.patchapplication/octet-stream; name=v22-0002-Allow-negative-aggtransspace-to-indicate-unbound.patchDownload
From ec282bb7fb963325a30a3e94375289aa5457004b Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 12 Sep 2025 13:11:47 +0900
Subject: [PATCH v22 2/2] Allow negative aggtransspace to indicate unbounded
state size
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
---
doc/src/sgml/catalogs.sgml | 5 ++++-
doc/src/sgml/ref/create_aggregate.sgml | 11 ++++++++---
src/backend/optimizer/plan/initsplan.c | 23 +++++++----------------
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_aggregate.dat | 10 ++++++----
src/test/regress/expected/opr_sanity.out | 2 +-
src/test/regress/sql/opr_sanity.sql | 2 +-
7 files changed, 28 insertions(+), 27 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9095bedf21..3acc2222a87 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -596,7 +596,10 @@
</para>
<para>
Approximate average size (in bytes) of the transition state
- data, or zero to use a default estimate
+ data. A positive value provides an estimate; zero means to
+ use a default estimate. A negative value indicates the state
+ data can grow unboundedly in size, such as when the aggregate
+ accumulates input rows (e.g., array_agg, string_agg).
</para></entry>
</row>
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index 222e0aa5c9d..0472ac2e874 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -384,9 +384,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state value.
If this parameter is omitted or is zero, a default estimate is used
- based on the <replaceable>state_data_type</replaceable>.
+ based on the <replaceable>state_data_type</replaceable>. If set to a
+ negative value, it indicates the state data can grow unboundedly in
+ size, such as when the aggregate accumulates input rows (e.g.,
+ array_agg, string_agg).
The planner uses this value to estimate the memory required for a
- grouped aggregate query.
+ grouped aggregate query and to avoid optimizations that may cause
+ excessive memory usage.
</para>
</listitem>
</varlistentry>
@@ -568,7 +572,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state
value, when using moving-aggregate mode. This works the same as
- <replaceable>state_data_size</replaceable>.
+ <replaceable>state_data_size</replaceable>, except that negative
+ values are not used to indicate unbounded state size.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 1b778f692d4..cb29c72c96c 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -716,19 +716,14 @@ setup_eager_aggregation(PlannerInfo *root)
/*
* is_partial_agg_memory_risky
- * Checks if any aggregate poses a risk of excessive memory usage during
+ * Check if any aggregate poses a risk of excessive memory usage during
* partial aggregation.
*
- * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
- * is marked as pass-by-value, it usually points to a large internal data
- * structure (like those used by string_agg or array_agg). These transition
- * states can grow large and their size is hard to estimate. Applying eager
- * aggregation in such cases risks high memory usage since partial aggregation
- * results might be stored in join hash tables or materialized nodes.
- *
- * We explicitly exclude aggregates with AVG_ACCUM transition function from
- * this check, based on the assumption that avg() and sum() are safe in this
- * context.
+ * We check if any aggregate has a negative aggtransspace value, which
+ * indicates that its transition state data can grow unboundedly in size.
+ * Applying eager aggregation in such cases risks high memory usage since
+ * partial aggregation results might be stored in join hash tables or
+ * materialized nodes.
*/
static bool
is_partial_agg_memory_risky(PlannerInfo *root)
@@ -739,11 +734,7 @@ is_partial_agg_memory_risky(PlannerInfo *root)
{
AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
- if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
- transinfo->transfn_oid == F_INT8_AVG_ACCUM)
- continue;
-
- if (transinfo->aggtranstype == INTERNALOID)
+ if (transinfo->aggtransspace < 0)
return true;
}
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ef0d0f92165..62b0af3e0c3 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202509091
+#define CATALOG_VERSION_NO 202509121
#endif
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index d6aa1f6ec47..870769e8f14 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -558,26 +558,28 @@
aggfinalfn => 'array_agg_finalfn', aggcombinefn => 'array_agg_combine',
aggserialfn => 'array_agg_serialize',
aggdeserialfn => 'array_agg_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
{ aggfnoid => 'array_agg(anyarray)', aggtransfn => 'array_agg_array_transfn',
aggfinalfn => 'array_agg_array_finalfn',
aggcombinefn => 'array_agg_array_combine',
aggserialfn => 'array_agg_array_serialize',
aggdeserialfn => 'array_agg_array_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
# text
{ aggfnoid => 'string_agg(text,text)', aggtransfn => 'string_agg_transfn',
aggfinalfn => 'string_agg_finalfn', aggcombinefn => 'string_agg_combine',
aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# bytea
{ aggfnoid => 'string_agg(bytea,bytea)',
aggtransfn => 'bytea_string_agg_transfn',
aggfinalfn => 'bytea_string_agg_finalfn',
aggcombinefn => 'string_agg_combine', aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# range
{ aggfnoid => 'range_intersect_agg(anyrange)',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 20bf9ea9cdf..a357e1d0c0e 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1470,7 +1470,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
ctid | aggfnoid
------+----------
(0 rows)
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 2fb3a852878..cd674d7dbca 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -847,7 +847,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
-- Make sure the matching pg_proc entry is sensible, too.
--
2.39.5 (Apple Git-154)
On Fri, Sep 12, 2025 at 5:34 AM Richard Guo <guofenglinux@gmail.com> wrote:
I really like this idea. Currently, aggtransspace represents an
estimate of the transition state size provided by the aggregate
definition. If it's set to zero, a default estimate based on the
state data type is used. Negative values currently have no defined
meaning. I think it makes perfect sense to reuse this field so that
a negative value indicates that the transition state data can grow
unboundedly in size.Attached 0002 implements this idea. It requires fewer code changes
than I expected. This is mainly because that our current code uses
aggtransspace in such a way that if it's a positive value, that value
is used as it's provided by the aggregate definition; otherwise, some
heuristics are applied to estimate the size. For the aggregates that
accumulate input rows (e.g., array_agg, string_agg), I don't currently
have a better heuristic for estimating their size, so I've chosen to
keep the current logic. This won't regress anything in estimating
transition state data size.
This might be OK, but it's not what I was suggesting: I was suggesting
trying to do a calculation like space_used = -aggtransspace *
rowcount, not just using a <0 value as a sentinel.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Sep 13, 2025 at 3:48 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 12, 2025 at 5:34 AM Richard Guo <guofenglinux@gmail.com> wrote:
I really like this idea. Currently, aggtransspace represents an
estimate of the transition state size provided by the aggregate
definition. If it's set to zero, a default estimate based on the
state data type is used. Negative values currently have no defined
meaning. I think it makes perfect sense to reuse this field so that
a negative value indicates that the transition state data can grow
unboundedly in size.Attached 0002 implements this idea. It requires fewer code changes
than I expected. This is mainly because that our current code uses
aggtransspace in such a way that if it's a positive value, that value
is used as it's provided by the aggregate definition; otherwise, some
heuristics are applied to estimate the size. For the aggregates that
accumulate input rows (e.g., array_agg, string_agg), I don't currently
have a better heuristic for estimating their size, so I've chosen to
keep the current logic. This won't regress anything in estimating
transition state data size.
This might be OK, but it's not what I was suggesting: I was suggesting
trying to do a calculation like space_used = -aggtransspace *
rowcount, not just using a <0 value as a sentinel.
I've considered your suggestion, but I'm not sure I'll adopt it in the
end. Here's why:
1) At the point where we check whether any aggregates might pose a
risk of excessive memory usage during partial aggregation, row count
information is not yet available. You could argue that we could
reorganize the logic to perform this check after we've had the row
count, but that seems quite tricky. If I understand correctly, the
"rowcount" in this context actually means the number of rows within
one partial group. That would require us to first decide on the
grouping expressions for the partial aggregation, then compute the
group row counts, then estimate space usage, and only then decide
whether memory usage is excessive and fall back. This would come
quite late in planning and adds nontrivial overhead, compared to the
current approach which checks at the very beginning.
2) Even if we were able to estimate space usage based on the number of
rows per partial group and determined that memory usage seems
acceptable, we still couldn't guarantee that the transition state data
won't grow excessively after further joins. Joins can multiply
partial aggregates, potentially causing a blowup in memory usage even
if the initial estimate seemed safe.
3) I don't think "-aggtransspace * rowcount" reflects the true memory
footprint for aggregates that accumulate input rows. For example,
what if we have an aggregate like string_agg(somecolumn, 'a very long
delimiter')?
4) AFAICS, the main downside of the current approach compared to yours
is that it avoids pushing down aggregates like string_agg() that
accumulate input rows, whereas your suggestion might allow pushing
them down in some cases where we *think* it wouldn't blow up memory.
You might argue that the current implementation is over-conservative.
But I prefer to start safe.
That said, I appreciate you proposing the idea of reusing
aggtransspace, although I ended up using it in a different way than
you suggested.
- Richard
I've run TPC-DS again to compare planning times with and without eager
aggregation. Out of 99 queries, only one query (query 64) shows a
noticeable increase in planning time. This query performs inner joins
across 38 tables. This is a very large search space. (I'm talking
about the standard join search method, not the GEQO.)
If my math doesn't fail me, the maximum number of different join
orders when joining n tables is: Catalan(n − 1) x n!. For n = 38,
this number is astronomically large. In practice, query 64 joins 19
tables twice (due to a CTE), which still results in about 3.4E28
different join orders.
Of course, in practice, with the help of join_collapse_limit and other
heuristics, the effective search space is reduced a lot, but even
then, it remains very large. Given this, I'm not too surprised that
query 64 shows an increase in planning time when eager aggregation is
applied -- exploring the best join order in such a space is inherently
expensive.
That said, I've identified a few performance hotspots that can be
optimized to help reduce planning time:
1) the exprs_known_equal() call in get_expression_sortgroupref(),
which is used to check if a given expression is known equal to a
grouping expression due to ECs. We can optimize this by storing the
EC of each grouping expression, and then get_expression_sortgroupref()
would only need to search the relevant EC, rather than scanning all of
them.
2) the estimate_num_groups() call in create_rel_agg_info(). We can
optimize this by avoiding unnecessary calls to estimate_num_groups()
where possible.
Attached is an updated version of the patch with these optimizations
applied. With this patch, the planning times for query 64, with and
without eager aggregation, are:
-- with eager aggregation
Planning Time: 9432.042 ms
-- without eager aggregation
Planning Time: 7196.999 ms
I think the increase in planning time is acceptable given the large
search space involved, though I may be biased.
- Richard
Attachments:
v23-0001-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v23-0001-Implement-Eager-Aggregation.patchDownload
From 63d36fe266e5c8ab19079698a3ea5e9abb3218bd Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v23 1/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 110 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 469 +++++
src/backend/optimizer/path/joinrels.c | 193 ++
src/backend/optimizer/plan/initsplan.c | 379 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 51 +
src/backend/optimizer/util/relnode.c | 650 +++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 121 ++
src/include/optimizer/pathnode.h | 6 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1584 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 225 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 4029 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6dc04e916dc..f5a57b9cbd5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e9b420f3ddb..39e658b7808 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..6c35baceedb 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,113 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, let's first consider the
+case where only inner joins are involved. In this case, we partition
+the tables in the FROM clause into two groups: those that contain at
+least one aggregation column, and those that do not contain any
+aggregation columns. Each group can be treated as a single relation
+formed by the Cartesian product of the tables within that group.
+Therefore, without loss of generality, we can assume that the FROM
+clause contains exactly two relations, R1 and R2, where R1 represents
+the relation containing all aggregation columns, and R2 represents the
+relation without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+In the case where there are any outer joins, the situation becomes
+more complex due to join order constraints and the semantics of
+null-extension in outer joins. If the relations that contain at least
+one aggregation column cannot be treated as a single relation because
+of the join order constraints, partial aggregation paths will not be
+generated, and thus the transformation is not applicable. Otherwise,
+let R1 be the relation containing all aggregation columns, and R2, R3,
+... be the remaining relations. From the inner join case, under the
+aforementioned conditions, we have the equivalence:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 JOIN R3 ...)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 JOIN R3 ...)
+
+To preserve correctness when outer joins are involved, we require an
+additional condition:
+
+4) R1 must not be on the nullable side of any outer join.
+
+This condition ensures that partial aggregation over R1 does not
+suppress any null-extended rows that would be introduced by outer
+joins. If R1 is on the nullable side of an outer join, the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..ee298970427 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,344 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+ List *group_pathkeys = NIL;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping, and generate the pathkeys that represent the grouping
+ * requirements in that case.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+ if (can_sort)
+ {
+ RelOptInfo *top_grouped_rel;
+ List *top_group_tlist;
+
+ top_grouped_rel = IS_OTHER_REL(rel) ?
+ rel->top_parent->grouped_rel : grouped_rel;
+ top_group_tlist =
+ make_tlist_from_pathtarget(top_grouped_rel->agg_info->target);
+
+ group_pathkeys =
+ make_pathkeys_for_sortclauses(root, agg_info->group_clauses,
+ top_group_tlist);
+ }
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3915,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3939,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4829,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..240eda53696 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel, rel1_empty == rel2_empty);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..1af43bb60d2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,12 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
+static EquivalenceClass *get_eclass_for_sortgroupclause(PlannerInfo *root,
+ SortGroupClause *sgc,
+ Expr *expr);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +636,377 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with AVG_ACCUM transition function from
+ * this check, based on the assumption that avg() and sum() are safe in this
+ * context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
+ transinfo->transfn_oid == F_INT8_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *ecs = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ ecs = lappend(ecs, get_eclass_for_sortgroupclause(root, sgc, tle->expr));
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, ecs)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->ec = ec;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
+/*
+ * get_eclass_for_sortgroupclause
+ * Given a group clause and an expression, find an existing equivalence
+ * class that the expression is a member of; return NULL if none.
+ */
+static EquivalenceClass *
+get_eclass_for_sortgroupclause(PlannerInfo *root, SortGroupClause *sgc,
+ Expr *expr)
+{
+ Oid opfamily,
+ opcintype,
+ collation;
+ CompareType cmptype;
+ Oid equality_op;
+ List *opfamilies;
+
+ /* Punt if the group clause is not sortable */
+ if (!OidIsValid(sgc->sortop))
+ return NULL;
+
+ /* Find the operator in pg_amop --- failure shouldn't happen */
+ if (!get_ordering_op_properties(sgc->sortop,
+ &opfamily, &opcintype, &cmptype))
+ elog(ERROR, "operator %u is not a valid ordering operator",
+ sgc->sortop);
+
+ /* Because SortGroupClause doesn't carry collation, consult the expr */
+ collation = exprCollation((Node *) expr);
+
+ /*
+ * EquivalenceClasses need to contain opfamily lists based on the family
+ * membership of mergejoinable equality operators, which could belong to
+ * more than one opfamily. So we have to look up the opfamily's equality
+ * operator and get its membership.
+ */
+ equality_op = get_opfamily_member_for_cmptype(opfamily,
+ opcintype,
+ opcintype,
+ COMPARE_EQ);
+ if (!OidIsValid(equality_op)) /* shouldn't happen */
+ elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
+ COMPARE_EQ, opcintype, opcintype, opfamily);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ if (!opfamilies) /* certainly should find some */
+ elog(ERROR, "could not find opfamilies for equality operator %u",
+ equality_op);
+
+ /* Now find a matching EquivalenceClass */
+ return get_eclass_for_sort_expr(root, expr, opfamilies, opcintype,
+ collation, sgc->tleSortGroupRef,
+ NULL, false);
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..69b8b0c2ae0 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,57 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = oldinfo->group_clauses;
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..e5bab59fbbe 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel, true);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,536 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ *
+ * calculate_grouped_rows: if true, calculate the estimated number of grouped
+ * rows for the relation. If false, skip the estimation to avoid unnecessary
+ * planning overhead.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful
+ * iff the average group size is no less than
+ * min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+ result->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ Assert(IsA(expr, Var));
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+ ListCell *lc1;
+
+ Assert(IsA(ge_info->expr, Var));
+ Assert(ge_info->sortgroupref > 0);
+
+ if (equal(expr, ge_info->expr))
+ return ge_info->sortgroupref;
+
+ if (ge_info->ec == NULL ||
+ !bms_is_member(((Var *) expr)->varno, ge_info->ec->ec_relids))
+ continue;
+
+ /*
+ * Scan the EquivalenceClass, looking for a match to the given
+ * expression. We ignore child members here.
+ */
+ foreach(lc1, ge_info->ec->ec_members)
+ {
+ EquivalenceMember *em = (EquivalenceMember *) lfirst(lc1);
+
+ /* Child members should not exist in ec_members */
+ Assert(!em->em_is_child);
+
+ if (equal(expr, em->em_expr))
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..b176d5130e4 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2427,6 +2434,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c36fcb9ab61..c5d612ab552 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b12a2508d8c..2786f8f0c4d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -391,6 +391,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1040,6 +1049,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1124,6 +1141,67 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "group_clauses" and "group_exprs" are lists of SortGroupClauses and the
+ * corresponding grouping expressions.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3268,6 +3346,49 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the EquivalenceClass it belongs to. This information is necessary to
+ * reproduce correct grouping semantics at different levels of the join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* the equivalence class the expression belongs to */
+ EquivalenceClass *ec pg_node_attr(copy_as_scalar, equal_as_scalar);
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..e509b8144ce 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,6 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..0dab585e9ce
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1584 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+--
+-- Test eager aggregation with GEQO
+--
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cd37f549b5a..bdbf21a874d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2840,20 +2840,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index cb12bf53719..fc84929a002 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..8b1049ae3f3
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,225 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+--
+-- Test eager aggregation with GEQO
+--
+
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3c80d49b67e..09752d57da4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
v23-0002-Allow-negative-aggtransspace-to-indicate-unbound.patchapplication/octet-stream; name=v23-0002-Allow-negative-aggtransspace-to-indicate-unbound.patchDownload
From 48b807a93c29c534c0151b950563b28021acd8c1 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 12 Sep 2025 13:11:47 +0900
Subject: [PATCH v23 2/2] Allow negative aggtransspace to indicate unbounded
state size
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
---
doc/src/sgml/catalogs.sgml | 5 ++++-
doc/src/sgml/ref/create_aggregate.sgml | 11 ++++++++---
src/backend/optimizer/plan/initsplan.c | 23 +++++++----------------
src/include/catalog/pg_aggregate.dat | 10 ++++++----
src/test/regress/expected/opr_sanity.out | 2 +-
src/test/regress/sql/opr_sanity.sql | 2 +-
6 files changed, 27 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9095bedf21..3acc2222a87 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -596,7 +596,10 @@
</para>
<para>
Approximate average size (in bytes) of the transition state
- data, or zero to use a default estimate
+ data. A positive value provides an estimate; zero means to
+ use a default estimate. A negative value indicates the state
+ data can grow unboundedly in size, such as when the aggregate
+ accumulates input rows (e.g., array_agg, string_agg).
</para></entry>
</row>
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index 222e0aa5c9d..0472ac2e874 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -384,9 +384,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state value.
If this parameter is omitted or is zero, a default estimate is used
- based on the <replaceable>state_data_type</replaceable>.
+ based on the <replaceable>state_data_type</replaceable>. If set to a
+ negative value, it indicates the state data can grow unboundedly in
+ size, such as when the aggregate accumulates input rows (e.g.,
+ array_agg, string_agg).
The planner uses this value to estimate the memory required for a
- grouped aggregate query.
+ grouped aggregate query and to avoid optimizations that may cause
+ excessive memory usage.
</para>
</listitem>
</varlistentry>
@@ -568,7 +572,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state
value, when using moving-aggregate mode. This works the same as
- <replaceable>state_data_size</replaceable>.
+ <replaceable>state_data_size</replaceable>, except that negative
+ values are not used to indicate unbounded state size.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 1af43bb60d2..b8d1c7e88a3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -719,19 +719,14 @@ setup_eager_aggregation(PlannerInfo *root)
/*
* is_partial_agg_memory_risky
- * Checks if any aggregate poses a risk of excessive memory usage during
+ * Check if any aggregate poses a risk of excessive memory usage during
* partial aggregation.
*
- * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
- * is marked as pass-by-value, it usually points to a large internal data
- * structure (like those used by string_agg or array_agg). These transition
- * states can grow large and their size is hard to estimate. Applying eager
- * aggregation in such cases risks high memory usage since partial aggregation
- * results might be stored in join hash tables or materialized nodes.
- *
- * We explicitly exclude aggregates with AVG_ACCUM transition function from
- * this check, based on the assumption that avg() and sum() are safe in this
- * context.
+ * We check if any aggregate has a negative aggtransspace value, which
+ * indicates that its transition state data can grow unboundedly in size.
+ * Applying eager aggregation in such cases risks high memory usage since
+ * partial aggregation results might be stored in join hash tables or
+ * materialized nodes.
*/
static bool
is_partial_agg_memory_risky(PlannerInfo *root)
@@ -742,11 +737,7 @@ is_partial_agg_memory_risky(PlannerInfo *root)
{
AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
- if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
- transinfo->transfn_oid == F_INT8_AVG_ACCUM)
- continue;
-
- if (transinfo->aggtranstype == INTERNALOID)
+ if (transinfo->aggtransspace < 0)
return true;
}
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index d6aa1f6ec47..870769e8f14 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -558,26 +558,28 @@
aggfinalfn => 'array_agg_finalfn', aggcombinefn => 'array_agg_combine',
aggserialfn => 'array_agg_serialize',
aggdeserialfn => 'array_agg_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
{ aggfnoid => 'array_agg(anyarray)', aggtransfn => 'array_agg_array_transfn',
aggfinalfn => 'array_agg_array_finalfn',
aggcombinefn => 'array_agg_array_combine',
aggserialfn => 'array_agg_array_serialize',
aggdeserialfn => 'array_agg_array_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
# text
{ aggfnoid => 'string_agg(text,text)', aggtransfn => 'string_agg_transfn',
aggfinalfn => 'string_agg_finalfn', aggcombinefn => 'string_agg_combine',
aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# bytea
{ aggfnoid => 'string_agg(bytea,bytea)',
aggtransfn => 'bytea_string_agg_transfn',
aggfinalfn => 'bytea_string_agg_finalfn',
aggcombinefn => 'string_agg_combine', aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# range
{ aggfnoid => 'range_intersect_agg(anyrange)',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 20bf9ea9cdf..a357e1d0c0e 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1470,7 +1470,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
ctid | aggfnoid
------+----------
(0 rows)
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 2fb3a852878..cd674d7dbca 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -847,7 +847,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
-- Make sure the matching pg_proc entry is sensible, too.
--
2.39.5 (Apple Git-154)
On Thu, Sep 25, 2025 at 1:23 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is an updated version of the patch with these optimizations
applied.
FWIW, I plan to do another self-review of this patch soon, with the
goal of assessing whether it's ready to be pushed. If anyone has any
concerns about any part of the patch or would like to review it, I
would greatly appreciate hearing from you.
- Richard
[ getting back to testing this patch ...]
On my last email you replied:
Debugging this query shows that all if conditions on
setup_eager_aggregation() returns false and create_agg_clause_infos()
and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
is also being set to true so I would expect to see Finalize and Partial
agg nodes, is this correct or am I missing something here?Well, just because eager aggregation *can* be applied does not mean
that it *will* be; it depends on whether it produces a lower-cost
execution plan. This transformation is cost-based, so it's not the
right mindset to assume that it will always be applied when possible.
Sorry for the noise here. I didn't consider the costs.
On Sun Sep 28, 2025 at 11:09 PM -03, Richard Guo wrote:
On Thu, Sep 25, 2025 at 1:23 PM Richard Guo <guofenglinux@gmail.com> wrote:
Attached is an updated version of the patch with these optimizations
applied.FWIW, I plan to do another self-review of this patch soon, with the
goal of assessing whether it's ready to be pushed. If anyone has any
concerns about any part of the patch or would like to review it, I
would greatly appreciate hearing from you.
I spent some time testing patch v23 using the TPC-DS benchmark and am
seeing worse execution times when using eager aggregation.
The most interesting cases are:
Query | planning time | execution time |
query 31 | -2.03% │ -99.56% │
query 71 | -15.51% │ -68.88% │
query 20 | -10.77% │ -32.40% │
query 26 | -28.01% │ -32.35% │
query 85 | -10.57% │ -31.91% │
query 77 | -30.07% │ -31.38% │
query 69 | -32.79% │ -29.21% │
query 32 | -68.48% │ -27.89% │
query 57 | -7.99% │ -27.32% │
query 91 | -24.81% │ -26.20% │
query 23 | -11.72% │ -18.24% │
The query 31 seems bad, I don't know if I'm doing something completely
wrong but I've just setup a TPC-DS database and then executed the query
on master and with the v23 patch and I got these results:
Master:
Planning Time: 3.191 ms
Execution Time: 16950.619 ms
Patch:
Planning Time: 3.257 ms
Execution Time: 3848355.646 ms
Note that I've executed ANALYZE before running the queries on both
scenarios (master and patched).
I'm attaching an EXPLAIN(ANALYZE) output for the query 31 from master
and with the patch applied.
Please let me know if there is any other test that I can run to
benchmark this patch.
--
Matheus Alcantara
On Thu, Oct 2, 2025 at 8:55 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
The query 31 seems bad, I don't know if I'm doing something completely
wrong but I've just setup a TPC-DS database and then executed the query
on master and with the v23 patch and I got these results:Master:
Planning Time: 3.191 ms
Execution Time: 16950.619 msPatch:
Planning Time: 3.257 ms
Execution Time: 3848355.646 ms
Thanks for reporting this. It does seem odd. I checked the TPC-DS
benchmarking on v13 and found that the execution time for query 31,
with and without eager aggregation, is as follows:
EAGER-AGG-OFF EAGER-AGG-ON
q31 10463.536 ms 10244.175 ms
There appears to be a regression between v13 and v23. Looking into
it...
- Richard
On Thu, Oct 2, 2025 at 10:13 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Thu, Oct 2, 2025 at 8:55 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:The query 31 seems bad, I don't know if I'm doing something completely
wrong but I've just setup a TPC-DS database and then executed the query
on master and with the v23 patch and I got these results:Master:
Planning Time: 3.191 ms
Execution Time: 16950.619 msPatch:
Planning Time: 3.257 ms
Execution Time: 3848355.646 ms
Thanks for reporting this. It does seem odd. I checked the TPC-DS
benchmarking on v13 and found that the execution time for query 31,
with and without eager aggregation, is as follows:EAGER-AGG-OFF EAGER-AGG-ON
q31 10463.536 ms 10244.175 msThere appears to be a regression between v13 and v23. Looking into
it...
I noticed something interesting while comparing the two EXPLAIN
(ANALYZE) outputs: the patched version uses parallel plans, whereas
the master does not. To rule that out as a factor, I ran "SET
max_parallel_workers_per_gather TO 0;" and re-ran query 31 on both
master and the patched version. This time, I got a positive result.
-- on master
Planning Time: 5.281 ms
Execution Time: 7222.665 ms
-- on patched
Planning Time: 4.855 ms
Execution Time: 5977.287 ms
It seems eager aggregation doesn't cope well with parallel plans for
this query. Looking into it.
- Richard
On Thu, Oct 2, 2025 at 10:39 AM Richard Guo <guofenglinux@gmail.com> wrote:
It seems eager aggregation doesn't cope well with parallel plans for
this query. Looking into it.
It turns out that this is not related to parallel plans but rather to
poor size estimates.
Looking at query 31, it involves joining 6 base relations, all of
which are CTE references (i.e., RTE_CTE relations) to two different
CTEs. Each CTE involves aggregations and GROUP BY clauses.
Unfortunately, our size estimates for CTE relations are quite poor,
especially when the CTE uses GROUP BY. In these cases, we don't have
any ANALYZE statistics available (cf. examine_simple_variable). As a
result, when computing the selectivity of the CTE relation's qual
clauses, we have to fall back on default values. For example, for
quals like "CTE.var = const", which are used a lot in query 31, the
selectivity is computed as "1.0 / DEFAULT_NUM_DISTINCT(200)", with the
assumption that there are DEFAULT_NUM_DISTINCT distinct values in the
relation, and that these values are equally common (cf. var_eq_const).
The consequence is that the size estimates are significantly different
from the actual values. For example, from the EXPLAIN(ANALYZE) output
provided by Matheus:
-> CTE Scan on ws ws3 (cost=0.00..1797.35 rows=2 width=110)
(actual time=0.001..74.725 rows=1261.00 loops=1)
Filter: ((d_year = 1999) AND (d_qoy = 3))
Interestingly, with eager aggregation applied, the row count estimates
for the two CTE plans actually become closer to the actual values.
-- without eager aggregation
CTE ws
-> HashAggregate (cost=96009.03..114825.35 rows=718952 width=54)
(actual time=977.215..1014.889 rows=23320.00 loops=1)
-- with eager aggregation
CTE ws
-> Finalize GroupAggregate (cost=52144.19..62314.79 rows=71894 width=54)
(actual time=275.121..340.107 rows=23312.00 loops=1)
However, due to the highly underestimated selectivity for the qual
clauses, the row count estimates for CTE Scan nodes become worse.
This is because:
-- without eager aggregation
718952 * (1.0/200) * (1.0/200) ~= 18
-- with eager aggregation
71894 * (1.0/200) * (1.0/200) ~= 2
... while the actual row count is 1261.00 as shown above.
That is to say, on master, the CTE plan rows are overestimated while
the selectivity estimates are severely underestimated. With eager
aggregation, the CTE plan rows become closer to the actual values, but
the selectivity estimates remain equally underestimated. As a result,
the row count estimates for the CTE Scan nodes worsen with eager
aggregation. This causes the join order in the final plan to change
when eager aggregation is applied, leading to longer execution times
in this case.
Another point to note is that, due to severely underestimated
selectivity estimates (0.000025, sometimes 0.000000125), the size
estimates for the CTE relations are very small, causing the planner to
tend to choose nestloops. I tried manually disabling nestloop, and
here are what I got for query 31.
-- on master, set enable_nestloop to on;
Planning Time: 4.613 ms
Execution Time: 7142.090 ms
-- on master, set enable_nestloop to off;
Planning Time: 4.315 ms
Execution Time: 2262.330 ms
-- on patched, set enable_nestloop to off;
Planning Time: 4.321 ms
Execution Time: 1214.376 ms
That is, on master, simply disabling nestloop makes query 31 run more
than 3 times faster. Enabling eager aggregation on top of that
improves performance further, making it run 1.86 times faster relative
to the nested-loop-disabled baseline.
I manually disabled nested loops for other TPC-DS queries on master
and discovered some additional interesting findings.
For query 4, on master:
-- set enable_nestloop to on
Planning Time: 3.054 ms
Execution Time: 3231356.258 ms
-- set enable_nestloop to off
Planning Time: 4.291 ms
Execution Time: 12751.170 ms
That is, on master, simply disabling nestloop makes query 4 run more
than 253 times faster.
For query 11, on master:
-- set enable_nestloop to on
Planning Time: 1.435 ms
Execution Time: 1824860.937 ms
-- set enable_nestloop to off
Planning Time: 2.479 ms
Execution Time: 7984.360 ms
Disabling nestloop makes query 11 run more than 228 times faster.
I believe you can find more such queries in TPC-DS if you keep
looking. Given this, I don't think it makes much sense to debug a
performance regression on TPC-DS with nestloop enabled.
Matheus, I wonder if you could help run TPC-DS again with this patch,
this time with nested loops disabled for all queries.
- Richard
On Thu Oct 2, 2025 at 5:49 AM -03, Richard Guo wrote:
On Thu, Oct 2, 2025 at 10:39 AM Richard Guo <guofenglinux@gmail.com> wrote:
It seems eager aggregation doesn't cope well with parallel plans for
this query. Looking into it.It turns out that this is not related to parallel plans but rather to
poor size estimates.[ ... ]
Matheus, I wonder if you could help run TPC-DS again with this patch,
this time with nested loops disabled for all queries.
Thanks for all the details. I've disabled the nested loops and executed
the benchmark again and the results look much better! I see a 55%
improvement on query_31 on my machine now (MacOS M3 Max).
The only query that I see a considerable regression is query 23 which I
get a 23% worst execution time. I'm attaching the EXPLAIN(ANALYZE)
output from master and from the patched version if it's interesting.
I'm also attaching a csv with the planning time and execution time from
master and the patched version for all queries. It contains the % of
difference between the executions. Negative numbers means that the
patched version using eager aggregation is faster. (I loaded this csv on
a postgres table and played with some queries to analyze the results).
I'm just wondering if there is anything that can be done on the planner
to prevent this type of situation?
--
Matheus Alcantara
On Fri, Oct 3, 2025 at 3:41 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
Thanks for all the details. I've disabled the nested loops and executed
the benchmark again and the results look much better! I see a 55%
improvement on query_31 on my machine now (MacOS M3 Max).
Great! That is 2.23 times faster.
The only query that I see a considerable regression is query 23 which I
get a 23% worst execution time. I'm attaching the EXPLAIN(ANALYZE)
output from master and from the patched version if it's interesting.
I tested query 23 in my local environment but didn't observe the
regression.
-- on master
Planning Time: 1.950 ms
Execution Time: 3260.924 ms
-- on patched
Planning Time: 2.197 ms
Execution Time: 3237.287 ms
I ran the benchmark at scale factor 1 and executed ANALYZE beforehand.
For the build configuration, I disabled cassert.
Comparing the plans, I noticed one key difference: in the plan you
provided (query-23.patch.explain), the frequent_ss_items CTE uses
parallel aggregation, whereas in my local environment it does not.
This leads to a different final join order between the two plans.
However, given the highly inaccurate size and cost estimates for the
CTE Scan nodes, I'm not sure it's worth investigating further. I'm
starting to feel that trying to tune performance here, with such
inaccurate underlying estimates for CTEs, is like building on sand.
I'm also attaching a csv with the planning time and execution time from
master and the patched version for all queries. It contains the % of
difference between the executions. Negative numbers means that the
patched version using eager aggregation is faster. (I loaded this csv on
a postgres table and played with some queries to analyze the results).
I really appreciate this; it's very helpful.
I'm just wondering if there is anything that can be done on the planner
to prevent this type of situation?
I think the ideal solution is to improve our estimates for CTE
relations to make the plans for TPC-DS queries more reasonable. Of
course, for queries from other benchmarks, the issues may stem from
other plan nodes. IMHO, we really need some improvements in our cost
estimation.
- Richard
On Fri Oct 3, 2025 at 12:14 AM -03, Richard Guo wrote:
The only query that I see a considerable regression is query 23 which I
get a 23% worst execution time. I'm attaching the EXPLAIN(ANALYZE)
output from master and from the patched version if it's interesting.I tested query 23 in my local environment but didn't observe the
regression.-- on master
Planning Time: 1.950 ms
Execution Time: 3260.924 ms-- on patched
Planning Time: 2.197 ms
Execution Time: 3237.287 msI ran the benchmark at scale factor 1 and executed ANALYZE beforehand.
For the build configuration, I disabled cassert.
I've disabled the cassert and executed the ANALYZE again before
benchmarking and now I have similar results with a improvement on eager
aggregate version:
-- master
Planning Time: 2.734 ms
Execution Time: 5238.128 ms
-- patched
Planning Time: 2.578 ms
Execution Time: 4732.584 ms
Comparing the plans, I noticed one key difference: in the plan you
provided (query-23.patch.explain), the frequent_ss_items CTE uses
parallel aggregation, whereas in my local environment it does not.
This leads to a different final join order between the two plans.However, given the highly inaccurate size and cost estimates for the
CTE Scan nodes, I'm not sure it's worth investigating further. I'm
starting to feel that trying to tune performance here, with such
inaccurate underlying estimates for CTEs, is like building on sand.[ ...]
I'm just wondering if there is anything that can be done on the planner
to prevent this type of situation?I think the ideal solution is to improve our estimates for CTE
relations to make the plans for TPC-DS queries more reasonable. Of
course, for queries from other benchmarks, the issues may stem from
other plan nodes. IMHO, we really need some improvements in our cost
estimation.
Fair points, agree.
The performance results look good to me. I don't have to much comments
about the code although I'm still learning about the planner internals
this patch seems in good shape to me.
I'm just attaching a new csv with the last results after running with
cassert disabled and after executing ANALYZE. It looks good to me.
Thanks for working on this!
--
Matheus Alcantara
Attachments:
On Sat, Oct 4, 2025 at 5:03 AM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
I've disabled the cassert and executed the ANALYZE again before
benchmarking and now I have similar results with a improvement on eager
aggregate version:-- master
Planning Time: 2.734 ms
Execution Time: 5238.128 ms-- patched
Planning Time: 2.578 ms
Execution Time: 4732.584 ms
Great!
The performance results look good to me. I don't have to much comments
about the code although I'm still learning about the planner internals
this patch seems in good shape to me.
Thanks for running the benchmark and reviewing the patch.
I'm just attaching a new csv with the last results after running with
cassert disabled and after executing ANALYZE. It looks good to me.
Yeah, the results look good this time. There are no performance
regressions; on the contrary, several queries actually show very
really nice improvements.
- Richard
On Mon, Sep 29, 2025 at 11:09 AM Richard Guo <guofenglinux@gmail.com> wrote:
FWIW, I plan to do another self-review of this patch soon, with the
goal of assessing whether it's ready to be pushed. If anyone has any
concerns about any part of the patch or would like to review it, I
would greatly appreciate hearing from you.
Barring any objections, I'll plan to push v23 in a couple of days.
- Richard
On Mon, 6 Oct 2025 at 13:59, Richard Guo <guofenglinux@gmail.com> wrote:
Barring any objections, I'll plan to push v23 in a couple of days.
Not a complete review, but a customary look:
1. setup_base_grouped_rels() by name and the header comment claim to
operate on base relations, but the code seems to be coded to handle
OTHER_MEMBER rels too.
Note that set_base_rel_pathlists() explicitly skips anything that's
not RELOPT_BASEREL, so if you're not doing that, then you shouldn't
use "base" in the function name. It's confusing.
2. All the calls to generate_grouped_paths() pass the grouped_rel
RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep
it that way rather than access the agg_info from the given grouped_rel
from within the function?
3. " * The information needed are provided by the RelAggInfo
structure." This should use "is" rather than "are"
4. standard_join_search(). I think it's worth getting rid of the
duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check.
How about setting that in a local variable rather than recalling
bms_equal(). I don't believe the compiler will optimise the extra one
away as it can't know set_cheapest() doesn't change the relids. Also,
wouldn't it be better to check rel->grouped_rel != NULL first? Won't
that be NULL in most cases, where as !bms_equal(rel->relids,
root->all_query_rels) will be true in most cases? Likewise in
generate_partitionwise_join_paths().
5. Wouldn't it be better to do 0002 first and get that into core so
you don't have to do the hacky stuff in is_partial_agg_memory_risky()?
6. Shouldn't this be using lappend()?
agg_clause_list = list_append_unique(agg_clause_list, ac_info);
I don't understand why ac_info could already be in the list. You've
just done: ac_info = makeNode(AggClauseInfo);
7. The following comment talks about "base" relations. I don't think
it should be as the RelOptInfo can be an OTHER_MEMBER rel.
* build_simple_grouped_rel
* Construct a new RelOptInfo representing a grouped version of the input
* base relation.
*/
8. Normally we check the List is NIL instead of:
if (list_length(group_clauses) == 0)
9. In get_expression_sortgroupref(), a comment claims "We ignore child
members here.". I think that's outdated since ec_members no longer has
child members.
10. I don't think this comment quite makes sense:
* "apply_at" tracks the lowest join level at which partial aggregation is
* applied.
maybe "minimum set of rels to join before partial aggregation can be applied"?
or at least swap "is" for "can be".
My confusion comes from the fact you're stating "lowest join level",
which seems to indicate that it could be applied after further
relations have been joined, but then you're saying "is applied" to
indicate that it can only be applied at that level.
11. The way you've written the header comments for typedef struct
RelAggInfo seems weird. I've only ever seen extra details in the
header comment when the inline comments have been kept to a single
line. You're spanning multiple lines, so why have the out of line
comments in the header at all?
12. This just doesn't feel like the right name for this field:
/* lowest level partial aggregation is applied at */
Relids apply_at;
I can't help think that it should be something like "agg_relids" or
"required_relids". I understand you're currently only applying the
partial grouping when you get exactly the minimum set of relids in the
join search, but if this can be made fast enough, I expect that could
be changed in the future. If you do change it, then "apply_at" is a
pretty confusing name. Perhaps I've misunderstood here and if you did
that, you'd need to create another RelAggInfo to represent that?
13. Parameter names mismatch between definition and declaration in:
extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);
extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);
extern void generate_grouped_paths(PlannerInfo *root,
RelOptInfo *rel_grouped,
RelOptInfo *rel_plain,
RelAggInfo *agg_info);
14. Do all the regression tests need VERBOSE in EXPLAIN? It's making
the output kinda huge. It might also be nice to wrap the long queries
onto multiple lines to make them easier to read.
David
On Mon, Oct 6, 2025 at 10:59 PM David Rowley <dgrowleyml@gmail.com> wrote:
Not a complete review, but a customary look:
Thanks for all the comments! They've been very helpful.
1. setup_base_grouped_rels() by name and the header comment claim to
operate on base relations, but the code seems to be coded to handle
OTHER_MEMBER rels too.
Indeed. I renamed it to setup_simple_grouped_rels() and updated the
related comments in v24.
2. All the calls to generate_grouped_paths() pass the grouped_rel
RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep
it that way rather than access the agg_info from the given grouped_rel
from within the function?
Thanks. Fixed by removing the agg_info parameter.
3. " * The information needed are provided by the RelAggInfo
structure." This should use "is" rather than "are"
Yes.
4. standard_join_search(). I think it's worth getting rid of the
duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check.
How about setting that in a local variable rather than recalling
bms_equal(). I don't believe the compiler will optimise the extra one
away as it can't know set_cheapest() doesn't change the relids. Also,
wouldn't it be better to check rel->grouped_rel != NULL first? Won't
that be NULL in most cases, where as !bms_equal(rel->relids,
root->all_query_rels) will be true in most cases? Likewise in
generate_partitionwise_join_paths().
Good point. Done that way in v24.
5. Wouldn't it be better to do 0002 first and get that into core so
you don't have to do the hacky stuff in is_partial_agg_memory_risky()?
Agreed. Done in v24.
6. Shouldn't this be using lappend()?
agg_clause_list = list_append_unique(agg_clause_list, ac_info);
I don't understand why ac_info could already be in the list. You've
just done: ac_info = makeNode(AggClauseInfo);
A query can specify the same Aggref expressions multiple times in the
target list. Using lappend here can lead to duplicate partial Aggref
nodes in the targetlist of a grouped path, which is what I want to
avoid.
7. The following comment talks about "base" relations. I don't think
it should be as the RelOptInfo can be an OTHER_MEMBER rel.* build_simple_grouped_rel
* Construct a new RelOptInfo representing a grouped version of the input
* base relation.
*/
Fixed in v24.
8. Normally we check the List is NIL instead of:
if (list_length(group_clauses) == 0)
Right. Updated in v24.
9. In get_expression_sortgroupref(), a comment claims "We ignore child
members here.". I think that's outdated since ec_members no longer has
child members.
I think that comment is used to explain why we only scan ec_members
here. Similar comments can be found in many other places, such as in
equivclass.c:
/*
* Found our match. Scan the other EC members and attempt to generate
* joinclauses. Ignore children here.
*/
foreach(lc2, cur_ec->ec_members)
{
10. I don't think this comment quite makes sense:
* "apply_at" tracks the lowest join level at which partial aggregation is
* applied.maybe "minimum set of rels to join before partial aggregation can be applied"?
or at least swap "is" for "can be".
My confusion comes from the fact you're stating "lowest join level",
which seems to indicate that it could be applied after further
relations have been joined, but then you're saying "is applied" to
indicate that it can only be applied at that level.11. The way you've written the header comments for typedef struct
RelAggInfo seems weird. I've only ever seen extra details in the
header comment when the inline comments have been kept to a single
line. You're spanning multiple lines, so why have the out of line
comments in the header at all?12. This just doesn't feel like the right name for this field:
/* lowest level partial aggregation is applied at */
Relids apply_at;I can't help think that it should be something like "agg_relids" or
"required_relids". I understand you're currently only applying the
partial grouping when you get exactly the minimum set of relids in the
join search, but if this can be made fast enough, I expect that could
be changed in the future. If you do change it, then "apply_at" is a
pretty confusing name. Perhaps I've misunderstood here and if you did
that, you'd need to create another RelAggInfo to represent that?
Hmm, RelAggInfo is a per-relation structure; each grouped relation has
a valid RelAggInfo. The apply_at field represents the set of relids
where partial aggregation is applied within the paths of this grouped
relation. If we ever change this approach and allow the planner to
explore all join levels for placing partial aggregation, the apply_at
field will become obsolete (cf. prior to v17 patches).
I've updated the comment for apply_at to clarify that it refers to the
relids at which partial aggregation is applied.
I've also updated the comments within RelAggInfo to use one-line
style.
I retained the name of this field though.
13. Parameter names mismatch between definition and declaration in:
extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);
extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);extern void generate_grouped_paths(PlannerInfo *root,
RelOptInfo *rel_grouped,
RelOptInfo *rel_plain,
RelAggInfo *agg_info);
Nice catch! Fixed in v24.
14. Do all the regression tests need VERBOSE in EXPLAIN? It's making
the output kinda huge. It might also be nice to wrap the long queries
onto multiple lines to make them easier to read.
One of the challenges in this patch is generating the correct target
list for each grouped relation. So I'm kind of inclined to retain
VERBOSE in EXPLAIN. As I recall, the output target list in the test
cases saved me several times during development when I introduced
problematic code changes.
I wrapped the long queries in v24.
- Richard
Attachments:
v24-0001-Allow-negative-aggtransspace-to-indicate-unbound.patchapplication/octet-stream; name=v24-0001-Allow-negative-aggtransspace-to-indicate-unbound.patchDownload
From dc5d4fb9bae1412c3230329d22616e13f3cc9662 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 7 Oct 2025 10:16:37 +0900
Subject: [PATCH v24 1/2] Allow negative aggtransspace to indicate unbounded
state size
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
Per idea from Robert Haas, though applied differently than originally
suggested.
Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 5 ++++-
doc/src/sgml/ref/create_aggregate.sgml | 11 ++++++++---
src/include/catalog/pg_aggregate.dat | 10 ++++++----
src/test/regress/expected/opr_sanity.out | 2 +-
src/test/regress/sql/opr_sanity.sql | 2 +-
5 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9095bedf21..3acc2222a87 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -596,7 +596,10 @@
</para>
<para>
Approximate average size (in bytes) of the transition state
- data, or zero to use a default estimate
+ data. A positive value provides an estimate; zero means to
+ use a default estimate. A negative value indicates the state
+ data can grow unboundedly in size, such as when the aggregate
+ accumulates input rows (e.g., array_agg, string_agg).
</para></entry>
</row>
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index 222e0aa5c9d..0472ac2e874 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -384,9 +384,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state value.
If this parameter is omitted or is zero, a default estimate is used
- based on the <replaceable>state_data_type</replaceable>.
+ based on the <replaceable>state_data_type</replaceable>. If set to a
+ negative value, it indicates the state data can grow unboundedly in
+ size, such as when the aggregate accumulates input rows (e.g.,
+ array_agg, string_agg).
The planner uses this value to estimate the memory required for a
- grouped aggregate query.
+ grouped aggregate query and to avoid optimizations that may cause
+ excessive memory usage.
</para>
</listitem>
</varlistentry>
@@ -568,7 +572,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state
value, when using moving-aggregate mode. This works the same as
- <replaceable>state_data_size</replaceable>.
+ <replaceable>state_data_size</replaceable>, except that negative
+ values are not used to indicate unbounded state size.
</para>
</listitem>
</varlistentry>
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index d6aa1f6ec47..870769e8f14 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -558,26 +558,28 @@
aggfinalfn => 'array_agg_finalfn', aggcombinefn => 'array_agg_combine',
aggserialfn => 'array_agg_serialize',
aggdeserialfn => 'array_agg_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
{ aggfnoid => 'array_agg(anyarray)', aggtransfn => 'array_agg_array_transfn',
aggfinalfn => 'array_agg_array_finalfn',
aggcombinefn => 'array_agg_array_combine',
aggserialfn => 'array_agg_array_serialize',
aggdeserialfn => 'array_agg_array_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
# text
{ aggfnoid => 'string_agg(text,text)', aggtransfn => 'string_agg_transfn',
aggfinalfn => 'string_agg_finalfn', aggcombinefn => 'string_agg_combine',
aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# bytea
{ aggfnoid => 'string_agg(bytea,bytea)',
aggtransfn => 'bytea_string_agg_transfn',
aggfinalfn => 'bytea_string_agg_finalfn',
aggcombinefn => 'string_agg_combine', aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# range
{ aggfnoid => 'range_intersect_agg(anyrange)',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 20bf9ea9cdf..a357e1d0c0e 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1470,7 +1470,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
ctid | aggfnoid
------+----------
(0 rows)
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 2fb3a852878..cd674d7dbca 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -847,7 +847,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
-- Make sure the matching pg_proc entry is sensible, too.
--
2.39.5 (Apple Git-154)
v24-0002-Implement-Eager-Aggregation.patchapplication/octet-stream; name=v24-0002-Implement-Eager-Aggregation.patchDownload
From d03a39b1a88bee1280fbdd61529eac428902b39e Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v24 2/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <guofenglinux@gmail.com>
Author: Antonin Houska <ah@cybertec.at> (in an older version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version)
Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version)
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 110 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +-
src/backend/optimizer/path/allpaths.c | 467 ++++-
src/backend/optimizer/path/joinrels.c | 193 ++
src/backend/optimizer/plan/initsplan.c | 370 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 51 +
src/backend/optimizer/util/relnode.c | 650 +++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 117 ++
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 4 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1714 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 380 ++++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 4293 insertions(+), 76 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6dc04e916dc..f5a57b9cbd5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e9b420f3ddb..39e658b7808 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..6c35baceedb 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,113 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, let's first consider the
+case where only inner joins are involved. In this case, we partition
+the tables in the FROM clause into two groups: those that contain at
+least one aggregation column, and those that do not contain any
+aggregation columns. Each group can be treated as a single relation
+formed by the Cartesian product of the tables within that group.
+Therefore, without loss of generality, we can assume that the FROM
+clause contains exactly two relations, R1 and R2, where R1 represents
+the relation containing all aggregation columns, and R2 represents the
+relation without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+In the case where there are any outer joins, the situation becomes
+more complex due to join order constraints and the semantics of
+null-extension in outer joins. If the relations that contain at least
+one aggregation column cannot be treated as a single relation because
+of the join order constraints, partial aggregation paths will not be
+generated, and thus the transformation is not applicable. Otherwise,
+let R1 be the relation containing all aggregation columns, and R2, R3,
+... be the remaining relations. From the inner join case, under the
+aforementioned conditions, we have the equivalence:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 JOIN R3 ...)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 JOIN R3 ...)
+
+To preserve correctness when outer joins are involved, we require an
+additional condition:
+
+4) R1 must not be on the nullable side of any outer join.
+
+This condition ensures that partial aggregation over R1 does not
+suppress any null-extended rows that would be introduced by outer
+joins. If R1 is on the nullable side of an outer join, the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..e39c5da63eb 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -264,6 +264,9 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Keep searching if join order is not valid */
if (joinrel)
{
+ bool is_top_rel = bms_equal(joinrel->relids,
+ root->all_query_rels);
+
/* Create paths for partitionwise joins. */
generate_partitionwise_join_paths(root, joinrel);
@@ -273,12 +276,28 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
* rel once we know the final targetlist (see
* grouping_planner).
*/
- if (!bms_equal(joinrel->relids, root->all_query_rels))
+ if (!is_top_rel)
generate_useful_gather_paths(root, joinrel, false);
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (joinrel->grouped_rel != NULL && !is_top_rel)
+ {
+ RelOptInfo *grouped_rel = joinrel->grouped_rel;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel);
+ set_cheapest(grouped_rel);
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d7ff36d89be..cc562518b04 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_simple_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,12 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for simple rels (i.e., base or "other" member
+ * relations) where possible.
+ */
+ setup_simple_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +335,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_simple_grouped_rels
+ * For each simple relation, build a grouped simple relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_simple_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +604,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1359,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3332,6 +3415,345 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped relation.
+ *
+ * The information needed is provided by the RelAggInfo structure stored in
+ * "grouped_rel".
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel)
+{
+ RelAggInfo *agg_info = grouped_rel->agg_info;
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+ List *group_pathkeys = NIL;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping, and generate the pathkeys that represent the grouping
+ * requirements in that case.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+ if (can_sort)
+ {
+ RelOptInfo *top_grouped_rel;
+ List *top_group_tlist;
+
+ top_grouped_rel = IS_OTHER_REL(rel) ?
+ rel->top_parent->grouped_rel : grouped_rel;
+ top_group_tlist =
+ make_tlist_from_pathtarget(top_grouped_rel->agg_info->target);
+
+ group_pathkeys =
+ make_pathkeys_for_sortclauses(root, agg_info->group_clauses,
+ top_group_tlist);
+ }
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3491,11 +3913,19 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
+ bool is_top_rel;
+
rel = (RelOptInfo *) lfirst(lc);
+ is_top_rel = bms_equal(rel->relids, root->all_query_rels);
+
/* Create paths for partitionwise joins. */
generate_partitionwise_join_paths(root, rel);
@@ -3505,12 +3935,28 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
* once we know the final targetlist (see grouping_planner's and
* its call to apply_scanjoin_target_to_paths).
*/
- if (!bms_equal(rel->relids, root->all_query_rels))
+ if (!is_top_rel)
generate_useful_gather_paths(root, rel, false);
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (rel->grouped_rel != NULL && !is_top_rel)
+ {
+ RelOptInfo *grouped_rel = rel->grouped_rel;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel);
+ set_cheapest(grouped_rel);
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4380,6 +4826,25 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (child_rel->grouped_rel != NULL &&
+ !bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel = child_rel->grouped_rel;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel);
+ set_cheapest(grouped_rel);
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..43b84d239ed 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel, rel1_empty == rel2_empty);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the set of relids where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the set of relids where
+ * partial aggregation is applied. Note that these values may be
+ * updated later if it is determined that grouped paths can be
+ * constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the set of relids where partial aggregation is applied among the
+ * given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..b8d1c7e88a3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,12 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
+static EquivalenceClass *get_eclass_for_sortgroupclause(PlannerInfo *root,
+ SortGroupClause *sgc,
+ Expr *expr);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +636,368 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Check if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate has a negative aggtransspace value, which
+ * indicates that its transition state data can grow unboundedly in size.
+ * Applying eager aggregation in such cases risks high memory usage since
+ * partial aggregation results might be stored in join hash tables or
+ * materialized nodes.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->aggtransspace < 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *ecs = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ ecs = lappend(ecs, get_eclass_for_sortgroupclause(root, sgc, tle->expr));
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, ecs)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->ec = ec;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
+/*
+ * get_eclass_for_sortgroupclause
+ * Given a group clause and an expression, find an existing equivalence
+ * class that the expression is a member of; return NULL if none.
+ */
+static EquivalenceClass *
+get_eclass_for_sortgroupclause(PlannerInfo *root, SortGroupClause *sgc,
+ Expr *expr)
+{
+ Oid opfamily,
+ opcintype,
+ collation;
+ CompareType cmptype;
+ Oid equality_op;
+ List *opfamilies;
+
+ /* Punt if the group clause is not sortable */
+ if (!OidIsValid(sgc->sortop))
+ return NULL;
+
+ /* Find the operator in pg_amop --- failure shouldn't happen */
+ if (!get_ordering_op_properties(sgc->sortop,
+ &opfamily, &opcintype, &cmptype))
+ elog(ERROR, "operator %u is not a valid ordering operator",
+ sgc->sortop);
+
+ /* Because SortGroupClause doesn't carry collation, consult the expr */
+ collation = exprCollation((Node *) expr);
+
+ /*
+ * EquivalenceClasses need to contain opfamily lists based on the family
+ * membership of mergejoinable equality operators, which could belong to
+ * more than one opfamily. So we have to look up the opfamily's equality
+ * operator and get its membership.
+ */
+ equality_op = get_opfamily_member_for_cmptype(opfamily,
+ opcintype,
+ opcintype,
+ COMPARE_EQ);
+ if (!OidIsValid(equality_op)) /* shouldn't happen */
+ elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
+ COMPARE_EQ, opcintype, opcintype, opfamily);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ if (!opfamilies) /* certainly should find some */
+ elog(ERROR, "could not find opfamilies for equality operator %u",
+ equality_op);
+
+ /* Now find a matching EquivalenceClass */
+ return get_eclass_for_sort_expr(root, expr, opfamilies, opcintype,
+ collation, sgc->tleSortGroupRef,
+ NULL, false);
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..69b8b0c2ae0 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,57 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = oldinfo->group_clauses;
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..cf1bc672137 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * simple relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this simple
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel, true);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given simple relation are not considered
+ * useful, skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Track the set of relids at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,536 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ *
+ * calculate_grouped_rows: if true, calculate the estimated number of grouped
+ * rows for the relation. If false, skip the estimation to avoid unnecessary
+ * planning overhead.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful
+ * iff the average group size is no less than
+ * min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (group_clauses == NIL)
+ return NULL;
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+ result->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ Assert(IsA(expr, Var));
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+ ListCell *lc1;
+
+ Assert(IsA(ge_info->expr, Var));
+ Assert(ge_info->sortgroupref > 0);
+
+ if (equal(expr, ge_info->expr))
+ return ge_info->sortgroupref;
+
+ if (ge_info->ec == NULL ||
+ !bms_is_member(((Var *) expr)->varno, ge_info->ec->ec_relids))
+ continue;
+
+ /*
+ * Scan the EquivalenceClass, looking for a match to the given
+ * expression. We ignore child members here.
+ */
+ foreach(lc1, ge_info->ec->ec_members)
+ {
+ EquivalenceMember *em = (EquivalenceMember *) lfirst(lc1);
+
+ /* Child members should not exist in ec_members */
+ Assert(!em->em_is_child);
+
+ if (equal(expr, em->em_expr))
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..b176d5130e4 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2427,6 +2434,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c36fcb9ab61..c5d612ab552 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b12a2508d8c..798b431c5aa 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -391,6 +391,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1040,6 +1049,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1124,6 +1141,63 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create paths for a grouped relation.
+ *
+ * "target" is the default result targetlist for Paths scanning this grouped
+ * relation; list of Vars/Exprs, cost, width.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "group_clauses" and "group_exprs" are lists of SortGroupClauses and the
+ * corresponding grouping expressions.
+ *
+ * "apply_at" tracks the set of relids at which partial aggregation is applied
+ * in the paths of this grouped relation.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the output tlist for the grouped paths */
+ struct PathTarget *target;
+
+ /* the output tlist for the input paths */
+ struct PathTarget *agg_input;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+
+ /* the set of relids partial aggregation is applied at */
+ Relids apply_at;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3268,6 +3342,49 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the EquivalenceClass it belongs to. This information is necessary to
+ * reproduce correct grouping semantics at different levels of the join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* the equivalence class the expression belongs to */
+ EquivalenceClass *ec pg_node_attr(copy_as_scalar, equal_as_scalar);
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..da60383c2aa 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,8 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +353,6 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..f6a62df0b43 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,8 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..fc0f8c14ec9
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1714 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+--
+-- Test eager aggregation with GEQO
+--
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cd37f549b5a..bdbf21a874d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2840,20 +2840,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index cb12bf53719..fc84929a002 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..e328a83b4c7
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,380 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+--
+-- Test eager aggregation with GEQO
+--
+
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..02b5b041c45 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
On Tue, 7 Oct 2025 at 23:57, Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Oct 6, 2025 at 10:59 PM David Rowley <dgrowleyml@gmail.com> wrote:
6. Shouldn't this be using lappend()?
agg_clause_list = list_append_unique(agg_clause_list, ac_info);
I don't understand why ac_info could already be in the list. You've
just done: ac_info = makeNode(AggClauseInfo);A query can specify the same Aggref expressions multiple times in the
target list. Using lappend here can lead to duplicate partial Aggref
nodes in the targetlist of a grouped path, which is what I want to
avoid.
I was getting that mixed up with list_append_unique_ptr().
9. In get_expression_sortgroupref(), a comment claims "We ignore child
members here.". I think that's outdated since ec_members no longer has
child members.I think that comment is used to explain why we only scan ec_members
here. Similar comments can be found in many other places, such as in
equivclass.c:/*
* Found our match. Scan the other EC members and attempt to generate
* joinclauses. Ignore children here.
*/
foreach(lc2, cur_ec->ec_members)
{
I'd say that's also wrong. "Ignore" means not to pay attention to
something that's there. The child members are not there.
11. The way you've written the header comments for typedef struct
RelAggInfo seems weird. I've only ever seen extra details in the
header comment when the inline comments have been kept to a single
line. You're spanning multiple lines, so why have the out of line
comments in the header at all?
I've also updated the comments within RelAggInfo to use one-line
style.
The style I'd thought of had the comments on the same line as the
field. Something like struct EquivalenceClass.
I wrapped the long queries in v24.
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
The above comment and command mismatch to my understanding from
looking at postgresql.conf.sample and guc_parameters.dat.
David
On Tue, Oct 7, 2025 at 6:57 AM Richard Guo <guofenglinux@gmail.com> wrote:
10. I don't think this comment quite makes sense:
* "apply_at" tracks the lowest join level at which partial aggregation is
* applied.maybe "minimum set of rels to join before partial aggregation can be applied"?
I've updated the comment for apply_at to clarify that it refers to the
relids at which partial aggregation is applied.I've also updated the comments within RelAggInfo to use one-line
style.I retained the name of this field though.
For what it's worth, I also don't like that field name. I'm not sure
what to propose instead, but I don't think apply_at is very clear.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Oct 6, 2025 at 9:59 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Sep 29, 2025 at 11:09 AM Richard Guo <guofenglinux@gmail.com> wrote:
FWIW, I plan to do another self-review of this patch soon, with the
goal of assessing whether it's ready to be pushed. If anyone has any
concerns about any part of the patch or would like to review it, I
would greatly appreciate hearing from you.
Barring any objections, I'll plan to push v23 in a couple of days.
I've pushed v24 -- thanks for all the reviews! Now bracing for the
upcoming bug reports.
- Richard
On Wed, Oct 8, 2025 at 8:14 PM David Rowley <dgrowleyml@gmail.com> wrote:
+-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on;
The above comment and command mismatch to my understanding from
looking at postgresql.conf.sample and guc_parameters.dat.
Right. This GUC was disabled by default prior to v17, and this is a
leftover from that. Will push a fix. Thanks for pointing it out!
- Richard
On Wed, Oct 8, 2025 at 11:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Oct 7, 2025 at 6:57 AM Richard Guo <guofenglinux@gmail.com> wrote:
I retained the name of this field though.
For what it's worth, I also don't like that field name. I'm not sure
what to propose instead, but I don't think apply_at is very clear.
This field represents the set of relids at which partial aggregation
is applied. So how about naming it partial_agg_designated_relids?
That feels a bit verbose, though. How about partial_agg_relids or,
for brevity, agg_relids instead?
- Richard
Richard Guo <guofenglinux@gmail.com> writes:
On Wed, Oct 8, 2025 at 11:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
For what it's worth, I also don't like that field name. I'm not sure
what to propose instead, but I don't think apply_at is very clear.
This field represents the set of relids at which partial aggregation
is applied. So how about naming it partial_agg_designated_relids?
That feels a bit verbose, though. How about partial_agg_relids or,
for brevity, agg_relids instead?
I might be missing a subtlety here, but how about
"apply_aggregation_at" or "apply_partial_agg_at"?
I don't think including "relids" in the field name adds anything,
given the field's declared type and comments.
regards, tom lane
On Thu, Oct 9, 2025 at 11:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
On Wed, Oct 8, 2025 at 11:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
For what it's worth, I also don't like that field name. I'm not sure
what to propose instead, but I don't think apply_at is very clear.
This field represents the set of relids at which partial aggregation
is applied. So how about naming it partial_agg_designated_relids?
That feels a bit verbose, though. How about partial_agg_relids or,
for brevity, agg_relids instead?
I might be missing a subtlety here, but how about
"apply_aggregation_at" or "apply_partial_agg_at"?I don't think including "relids" in the field name adds anything,
given the field's declared type and comments.
Fair point.
'agg' seems better to me than 'aggregation' when used in a name: it's
shorter, and it's unlikely anyone would interpret it as anything other
than "aggregation".
I kind of wonder whether we need to include 'partial' in the name.
Given the context, it seems very clear that we're referring to
partial aggregation rather than final aggregation.
So I'm weighing between "apply_partial_agg_at" and "apply_agg_at".
- Richard
Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Oct 6, 2025 at 9:59 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Mon, Sep 29, 2025 at 11:09 AM Richard Guo <guofenglinux@gmail.com> wrote:
FWIW, I plan to do another self-review of this patch soon, with the
goal of assessing whether it's ready to be pushed. If anyone has any
concerns about any part of the patch or would like to review it, I
would greatly appreciate hearing from you.Barring any objections, I'll plan to push v23 in a couple of days.
I've pushed v24 -- thanks for all the reviews! Now bracing for the
upcoming bug reports.
Thanks for finishing this! The lack of feedback I encountered earlier made me
so frustrated that I could not find motivation to collaborate with you. I'm
happy now that my effort did not get wasted.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On Thu, Oct 9, 2025 at 2:09 PM Antonin Houska <ah@cybertec.at> wrote:
Richard Guo <guofenglinux@gmail.com> wrote:
I've pushed v24 -- thanks for all the reviews! Now bracing for the
upcoming bug reports.
Thanks for finishing this! The lack of feedback I encountered earlier made me
so frustrated that I could not find motivation to collaborate with you. I'm
happy now that my effort did not get wasted.
Your efforts in the earlier versions were very important for getting
this feature done. Thank you for your work.
- Richard
On Thu, Oct 9, 2025 at 10:49 AM Richard Guo <guofenglinux@gmail.com> wrote:
On Wed, Oct 8, 2025 at 8:14 PM David Rowley <dgrowleyml@gmail.com> wrote:
+-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on;The above comment and command mismatch to my understanding from
looking at postgresql.conf.sample and guc_parameters.dat.
Right. This GUC was disabled by default prior to v17, and this is a
leftover from that. Will push a fix. Thanks for pointing it out!
I noticed an unnecessary header include in initsplan.c. Will fix that
as well.
- Richard