WIP: Aggregation push-down - take2

Started by Fujii.Yuki@df.MitsubishiElectric.co.jpabout 4 years ago11 messageshackers

Fujii.Yuki@df.MitsubishiElectric.co.jp

about 4 years ago

Hi everyone.

I develop postgresql's extension such as fdw in my work.
I'm interested in using postgresql for OLAP.
After [1]https://commitfest.postgresql.org/32/ having been withdrawn, I reviewed [1]https://commitfest.postgresql.org/32/.
I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign table.
So, I would like to ask you a question on this patch in this new thread.

I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and this patch.
1. Things I found
I execute a query which contain join of postgres_fdw's foreign table and a table and aggregation of the join result.
In my setting, my prototype reduce this query's response by 93%.
2. Differences between my prototype and this patch
(1) Pushdown aggregation of foeign table if FDW pushdown partial aggregation
(2) postgres_fdw pushdowns some partial aggregations
I attached my prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of accept of this patch's function.
I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any reviewers / committers are
interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

[1]: https://commitfest.postgresql.org/32/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Fujii.Yuki@df.MitsubishiElectric.co.jp

almost 4 years ago

In reply to: Fujii.Yuki@df.MitsubishiElectric.co.jp (#1)

RE: WIP: Aggregation push-down - take2

Hi everyone.

I rebased the following patches which were submitted in [1]https://commitfest.postgresql.org/32/.
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch

I checked I can apply the rebased patch to commit 2cd2569c72b8920048e35c31c9be30a6170e1410.

I'm going to register the rebased patch in next commitfest.

Sincerely yours,
Yuuki Fujii

[1]: https://commitfest.postgresql.org/32/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Show quoted text

-----Original Message-----
From: Fujii.Yuki@df.MitsubishiElectric.co.jp
<Fujii.Yuki@df.MitsubishiElectric.co.jp>
Sent: Friday, April 15, 2022 4:33 PM
To: pgsql-hackers@lists.postgresql.org
Cc: david@pgmasters.net; ah@cybertec.at; tgl@sss.pgh.pa.us; Tomas Vondra
<tomas.vondra@enterprisedb.com>; zhihui.fan1213@gmail.com;
legrand_legrand@hotmail.com; daniel@yesql.se
Subject: [CAUTION!! MELCO?] WIP: Aggregation push-down - take2

Hi everyone.

I develop postgresql's extension such as fdw in my work.
I'm interested in using postgresql for OLAP.
After [1] having been withdrawn, I reviewed [1].
I think that this patch is realy useful when using OLAP queries.
Furthermore, I think it would be more useful if this patch works on a foreign
table.
So, I would like to ask you a question on this patch in this new thread.

I changed this patch a little and confirmed that my idea is true.
The followings are things I found and differences of between my prototype and
this patch.
1. Things I found
I execute a query which contain join of postgres_fdw's foreign table and a
table and aggregation of the join result.
In my setting, my prototype reduce this query's response by 93%.
2. Differences between my prototype and this patch
(1) Pushdown aggregation of foeign table if FDW pushdown partial
aggregation
(2) postgres_fdw pushdowns some partial aggregations I attached my
prototype source code and content of my experiment.
I want to resume development of this patch if there is some possibility of
accept of this patch's function.
I took a contact to Mr.Houska on resuming development of this patch.
As a result, Mr.Houska advised for me that I ask in pgsql-hackers whether any
reviewers / committers are interested to work on the patch.
Is anyone interested in my work?

Sincerely yours.
Yuuki Fujii

[1] https://commitfest.postgresql.org/32/

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 3 years ago

In reply to: Fujii.Yuki@df.MitsubishiElectric.co.jp (#2)

Re: WIP: Aggregation push-down - take2

Hi,

On 7/12/22 08:49, Fujii.Yuki@df.MitsubishiElectric.co.jp wrote:

Hi everyone.

I rebased the following patches which were submitted in [1].
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch

I checked I can apply the rebased patch to commit 2cd2569c72b8920048e35c31c9be30a6170e1410.

I'm going to register the rebased patch in next commitfest.

I've started looking at this patch series again, but I wonder what's the
plan. The last patch version no longer applies, so I rebased it - see
the attachment. The failures were pretty minor, but there're two warnings:

pathnode.c:3174:11: warning: variable 'agg_exprs' set but not used
[-Wunused-but-set-variable]
Node *agg_exprs;
^
pathnode.c:3252:11: warning: variable 'agg_exprs' set but not used
[-Wunused-but-set-variable]
Node *agg_exprs;
^

so there seem to be some loose ends. Moreover, there are two failures in
make check, due to plan changes like this:

+ Finalize GroupAggregate
    Group Key: p.i
-   ->  Nested Loop
-         ->  Partial HashAggregate
-               Group Key: c1.parent
-               ->  Seq Scan on agg_pushdown_child1 c1
-         ->  Index Scan using agg_pushdown_parent_pkey on ...
-               Index Cond: (i = c1.parent)
-(8 rows)
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on ...
+                     Index Cond: (i = c1.parent)
+(10 rows)

This seems somewhat strange - maybe the plan is correct, but the extra
sort seems unnecessary.

However, maybe I'm confused/missing something? The above message says
v17 having parts 0001-0003, but there's only one patch in v18. So maybe
I failed to apply some prior patch?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Antonin Houska

ah@cybertec.at

over 3 years ago

In reply to: Tomas Vondra (#3)

Re: WIP: Aggregation push-down - take2

Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Hi,

On 7/12/22 08:49, Fujii.Yuki@df.MitsubishiElectric.co.jp wrote:

Hi everyone.

I rebased the following patches which were submitted in [1].
v17-0001-Introduce-RelInfoList-structure.patch
v17-0002-Aggregate-push-down-basic-functionality.patch
v17-0003-Use-also-partial-paths-as-the-input-for-grouped-path.patch

I checked I can apply the rebased patch to commit 2cd2569c72b8920048e35c31c9be30a6170e1410.

I'm going to register the rebased patch in next commitfest.

I've started looking at this patch series again, but I wonder what's the
plan. The last patch version no longer applies, so I rebased it - see
the attachment. The failures were pretty minor, but there're two warnings:

pathnode.c:3174:11: warning: variable 'agg_exprs' set but not used
[-Wunused-but-set-variable]
Node *agg_exprs;
^
pathnode.c:3252:11: warning: variable 'agg_exprs' set but not used
[-Wunused-but-set-variable]
Node *agg_exprs;
^

so there seem to be some loose ends. Moreover, there are two failures in
make check, due to plan changes like this:
+ Finalize GroupAggregate
Group Key: p.i
-   ->  Nested Loop
-         ->  Partial HashAggregate
-               Group Key: c1.parent
-               ->  Seq Scan on agg_pushdown_child1 c1
-         ->  Index Scan using agg_pushdown_parent_pkey on ...
-               Index Cond: (i = c1.parent)
-(8 rows)
+   ->  Sort
+         Sort Key: p.i
+         ->  Nested Loop
+               ->  Partial HashAggregate
+                     Group Key: c1.parent
+                     ->  Seq Scan on agg_pushdown_child1 c1
+               ->  Index Scan using agg_pushdown_parent_pkey on ...
+                     Index Cond: (i = c1.parent)
+(10 rows)
This seems somewhat strange - maybe the plan is correct, but the extra
sort seems unnecessary.

However, maybe I'm confused/missing something? The above message says
v17 having parts 0001-0003, but there's only one patch in v18. So maybe
I failed to apply some prior patch?

I've rebased the last version I had on my workstation (v17), the regression
tests just worked. Maybe v18 was messed up. v20 is attached.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 3 years ago

In reply to: Antonin Houska (#4)

Re: WIP: Aggregation push-down - take2

Hi,

I did a quick initial review of the v20 patch series. I plan to do a
more thorough review over the next couple days, if time permits. In
general I think the patch is in pretty good shape.

I've added a bunch of comments in a number of places - see the "review
comments" parts for each of the original parts. That should make it
easier to deal with all the items. I'll go through the main stuff here:

1) I was somewhat confused why we even need RelInfoList, when it merely
wraps existing fields, but I guess it's because we need multiple such
pairs - one for joins, one for grouped rels. Correct?

2) While reading the README, I was somewhat confused because it seems to
suggest we have to push the aggregate only to baserel level, but then it
also talks about pushing to other places (above joins). There's a couple
other places in the README that confused me a bit, see the XXX comments.

In general, I think the README focuses on explaining the motivation,
i.e. why we want to do this, but it's somewhat light on how it's done.
The other parts talk about the implementation in more detail.

3) I tweaked a couple places in allpaths.c to make it more readable, but
I admit that's a somewhat subjective measure, so feel free to undo that.

4) setup_base_grouped_rels compares bitmaps before looking at
reloptkind, which seems to be cheaper so maybe the checks should happen
in the opposite order (not a huge difference, though)

5) add_grouped_path seems to be a bit confusing, because the name makes
it look like it does about the same stuff as add_path/add_partial_path,
when that's not quite true

6) 0002 failed to add enable_agg_pushdown to the sample file, which
leads to a failure in regression tests

7) when I change enable_agg_pushdown to true and run regression tests, I
get a bunch of failures like

ERROR: WindowFunc found where not expected

Seems we don't handle window functions correctly somewhere, or maybe
setup_aggregate_pushdown should check/reject hasWindowFuncs too?

8) create_ordinary_grouping_paths changes when set_cheapest() gets
called, but I can't quite convince myself the change is correct. How
come it's correct to check pathlist instead of partial_pathlist (as before).

9) I see create_agg_sorted_path is quite picky about the subpath
pathkeys, essentially requiring it to be a prefix of group_pathkeys.
Seems unnecessary, no? Even if we sort/group on different pathkeys, that
reduces the cardinality, and we may do sort later (or just finalize
using hashagg).

Furthermore, we generally try creating a sort with the proper ordering
in other places - why not here? I mean, if subpath has pathkeys=A and we
need [A,B], we could try adding suitable IncrementalSort, no? Or even
full Sort, or something. Or is that not beneficial here?

10) I don't understand why create_agg_hashed_path limits the hashtable
size to work_mem - shouldn't it do something like cost_agg to account
for spilling to disk?

11) There's an unnecessary/unrelated change in trigger.c.

12) I improved/reworded a couple comments where I initially was unsure
what exactly that means. Hopefully I got it right.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Antonin Houska

ah@cybertec.at

over 3 years ago

In reply to: Tomas Vondra (#5)

Re: WIP: Aggregation push-down - take2

Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Hi,

I did a quick initial review of the v20 patch series. I plan to do a
more thorough review over the next couple days, if time permits. In
general I think the patch is in pretty good shape.

Thanks.

I've added a bunch of comments in a number of places - see the "review
comments" parts for each of the original parts. That should make it
easier to deal with all the items. I'll go through the main stuff here:

Unless I miss something, all these items are covered in context below, except
for this one:

7) when I change enable_agg_pushdown to true and run regression tests, I
get a bunch of failures like

ERROR: WindowFunc found where not expected

Seems we don't handle window functions correctly somewhere, or maybe
setup_aggregate_pushdown should check/reject hasWindowFuncs too?

We don't need to reject window functions, window functions are processed after
grouping/aggregation. The problem I noticed in the regression tests was that a
window function referenced a (non-window) aggregate. We just need to ensure
that pull_var_clause() recurses into that window function in such cases:

Besides the next version, v21-fixes.patch file is attached. It tries to
summarize all the changes between v21 and v22. (I wonder if this attachment
makes the cfbot fail.)

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 8e913c92d8..8dc39765f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -355,7 +355,8 @@ create_aggregate_grouped_var_infos(PlannerInfo *root)
 	Assert(root->grouped_var_list == NIL);

 	tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
-								  PVC_INCLUDE_AGGREGATES);
+								  PVC_INCLUDE_AGGREGATES |
+								  PVC_RECURSE_WINDOWFUNCS);

/*
* Although GroupingFunc is related to root->parse->groupingSets, this

---
src/backend/optimizer/util/relnode.c | 11 +++++++++++
src/include/nodes/pathnodes.h | 3 +++
2 files changed, 14 insertions(+)

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 94720865f47..d4367ba14a5 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -382,6 +382,12 @@ find_base_rel(PlannerInfo *root, int relid)
/*
* build_rel_hash
*	  Construct the auxiliary hash table for relation specific data.
+ *
+ * XXX Why is this renamed, leaving out the "join" part? Are we going to use
+ * it for other purposes?

Yes, besides join relation, it's used to find the "grouped relation" by
Relids. This change tries to follow the suggestion "Maybe an appropriate
preliminary patch ..." in [1]/messages/by-id/9726.1542577439@sss.pgh.pa.us, but I haven't got any feedback whether my
understanding was correct.

+ * XXX Also, why change the API and not pass PlannerInfo? Seems pretty usual
+ * for planner functions.

I think that the reason was that, with the patch applied, PlannerInfo contains
multiple fields of the RelInfoList type, so build_rel_hash() needs an
information which one it should process. Passing the exact field is simpler
than passing PlannerInfo plus some additional information.

*/
static void
build_rel_hash(RelInfoList *list)
@@ -422,6 +428,11 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
*	  Find a base or join relation entry.
+ *
+ * XXX Why change the API and not pass PlannerInfo? Seems pretty usual
+ * for planner functions.

For the same reason that build_rel_hash() receives the list explicitly, see
above.

+ * XXX I don't understand why we need both this and find_join_rel.

Perhaps I just wanted to keep the call sites of find_join_rel() untouched. I
think that

find_join_rel(root, relids);

is a little bit easier to read than

(RelOptInfo *) find_rel_info(root->join_rel_list, relids);

*/
static void *
find_rel_info(RelInfoList *list, Relids relids)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ca7d5ab51e..018ce755720 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -88,6 +88,9 @@ typedef enum UpperRelationKind
* present and valid when rel_hash is not NULL.  Note that we still maintain
* the list even when using the hash table for lookups; this simplifies life
* for GEQO.
+ *
+ * XXX I wonder why we actually need a separate node, merely wrapping fields
+ * that already existed ...

This is so that the existing fields can still be printed out
(nodes/outfuncs.c).

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2fd1a962699..6f6b7d0b93b 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1168,6 +1168,12 @@ input of Agg node. However, if the groups are large enough, it may be more
efficient to apply the partial aggregation to the output of base relation
scan, and finalize it when we have all relations of the query joined:

+XXX review: Hmm, do we need to push it all the way down to base relations? Or
+would it make sense to do the agg on an intermediate level? Say, we're joining
+three tables A, B and C. Maybe the agg could/should be evaluated on top of join
+A+B, before joining with C? Say, maybe the aggregate references columns from
+both base relations?
+
EXPLAIN
SELECT a.i, avg(b.y)
FROM a JOIN b ON b.j = a.i

Another example below does show the partial aggregates at join level.

+XXX Perhaps mention this may also mean the partial ggregate could be pushed
+to a remote server with FDW partitions?

Even if it's not implemented in the current patch version?

+
Note that there's often no GROUP BY expression to be used for the partial
aggregation, so we use equivalence classes to derive grouping expression: in
the example above, the grouping key "b.j" was derived from "a.i".
+XXX I think this is slightly confusing - there is a GROUP BY expression for the
+partial aggregate, but as stated in the query it may not reference the side of
+a join explicitly.

ok, changed.

Also note that in this case the partial aggregate uses the "b.j" as grouping
column although the column does not appear in the query target list. The point
is that "b.j" is needed to evaluate the join condition, and there's no other
way for the partial aggregate to emit its values.
+XXX Not sure I understand what this is trying to say. Firstly, maybe it'd be
+helpful to show targetlists in the EXPLAIN, i.e. do it as VERBOSE. But more
+importantly, isn't this a direct consequence of the equivalence classes stuff
+mentioned in the preceding paragraph?

The equivalence class is just a mechanism to derive expressions which are not
explicitly mentioned in the query, but there's always a question whether you
need to derive any expression for particular table or not. Here I tried to
explain that the choice of join columns is related to the choice of grouping
keys for the partial aggregate.

I've deleted this paragraph and added a note to the previous one.

Besides base relation, the aggregation can also be pushed down to join:

EXPLAIN
@@ -1217,6 +1235,10 @@ Besides base relation, the aggregation can also be pushed down to join:
->  Hash
->  Seq Scan on a

+XXX Aha, so this is pretty-much an answer to my earlier comment, and matches
+my example with three tables. Maybe this suggests the initial reference to
+base relations is a bit confusing.

I tried to use the simplest example to demonstrate the concepts, then extended
it to the partially-aggregated joins.

+XXX I think this is a good explanation of the motivation for this patch, but
+maybe it'd be good to go into more details about how we decide if it's correct
+to actually do the pushdown, data structures etc. Similar to earlier parts of
+this README.

Added two paragraphs, see "Regarding correctness...".

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f00f900ff41..6d2c2f4fc36 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -196,9 +196,10 @@ make_one_rel(PlannerInfo *root, List *joinlist)
/*
* Now that the sizes are known, we can estimate the sizes of the grouped
* relations.
+	 *
+	 * XXX Seems more consistent with code nearby.
*/
-	if (root->grouped_var_list)
-		setup_base_grouped_rels(root);
+	setup_base_grouped_rels(root);

In general I prefer not calling a function if it's obvious that it's not
needed, but on the other hand the test of the 'grouped_var_list' field may be
considered disturbing from the caller's perspective. I've got no strong
opinion on this, so I can accept this proposal.

/*
- * setup_based_grouped_rels
+ * setup_base_grouped_rels
* For each "plain" relation build a grouped relation if aggregate pushdown
* is possible and if this relation is suitable for partial aggregation.
*/

Fixed, thanks.

{
Index rti;

+	/* If there are no grouped relations, estimate their sizes. */
+	if (!root->grouped_var_list)
+		return;
+

Accepted, but with different wording (s/relations/expressions/).

+		/* XXX Shouldn't this check be earlier? Seems cheaper than the check
+		 * calling bms_nonempty_difference, for example. */
if (brel->reloptkind != RELOPT_BASEREL)
continue;

Right, moved.

rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
-		if (rel_grouped)
-		{
-			/* Make the relation available for joining. */
-			add_grouped_rel(root, rel_grouped, agg_info);
-		}
+
+		/* XXX When does this happen? */
+		if (!rel_grouped)
+			continue;
+
+		/* Make the relation available for joining. */
+		add_grouped_rel(root, rel_grouped, agg_info);

I'd use the "continue" statement if there was a lot of code in the "if
(rel_grouped) {...}" branch, but no strong preference in this case, so
accepted.

}
}

@@ -560,6 +569,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Plain relation */
set_plain_rel_pathlist(root, rel, rte);
+					/* XXX Shouldn't this really be part of set_plain_rel_pathlist? */
+
/* Add paths to the grouped relation if one exists. */
rel_grouped = find_grouped_rel(root, rel->relids,

Yes, it can. Moved.

@@ -3382,6 +3393,11 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,

/*
* Apply partial aggregation to a subpath and add the AggPath to the pathlist.
+ *
+ * XXX I think this is potentially quite confusing, because the existing "add"
+ * functions add_path and add_partial_path only check if the proposed path is
+ * dominated by an existing path, pathkeys, etc. But this does more than that,
+ * perhaps even constructing new path etc.
*/
static void
add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,

Maybe, but I don't have a good idea of an alternative name.
create_group_path() already exists and the create_*_path() functions are
rather low-level. Maybe generate_grouped_path(), and at the same time rename
generate_grouping_paths() to generate_grouped_paths()? In general, the
generate_*_path*() functions do non-trivial things and eventually call
add_path().

@@ -3399,9 +3414,16 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
else
elog(ERROR, "unexpected strategy %d", aggstrategy);

+	/*
+	 * Bail out if we failed to create a suitable aggregated path. This can
+	 * happen e.g. then the path does not support hashing (for AGG_HASHED),
+	 * or when the input path is not sorted.
+	 */
+	if (agg_path == NULL)
+		return;
+
/* Add the grouped path to the list of grouped base paths. */
-	if (agg_path != NULL)
-		add_path(rel, (Path *) agg_path);
+	add_path(rel, (Path *) agg_path);

ok, changed.

}

/*
@@ -3545,7 +3567,6 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)

for (lev = 2; lev <= levels_needed; lev++)
{
- RelOptInfo *rel_grouped;
ListCell *lc;

/*
@@ -3567,6 +3588,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*/
foreach(lc, root->join_rel_level[lev])
{
+ RelOptInfo *rel_grouped;
+
rel = (RelOptInfo *) lfirst(lc);

Sure, fixed.

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 8e913c92d8b..d7a9de9645e 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -278,6 +278,8 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
* each possible grouping expression.
*
* root->group_pathkeys must be setup before this function is called.
+ *
+ * XXX Perhaps this should check/reject hasWindowFuncs too?

create_window_paths() is called after create_grouping_paths() (see
grouping_planner()), so it should not care whether the input (possibly
grouped) paths involve the aggregate push-down or not.

*/
extern void
setup_aggregate_pushdown(PlannerInfo *root)
@@ -311,6 +313,12 @@ setup_aggregate_pushdown(PlannerInfo *root)
if (root->parse->hasTargetSRFs)
return;
+	/*
+	 * XXX Maybe it'd be better to move create_aggregate_grouped_var_infos and
+	 * create_grouping_expr_grouped_var_infos to a function returning bool, and
+	 * only check that here.
+	 */
+

Hm, it looks to me like too much "indirection", and also a decriptive function
name would be tricky to invent.

/* Create GroupedVarInfo per (distinct) aggregate. */
create_aggregate_grouped_var_infos(root);

@@ -329,6 +337,8 @@ setup_aggregate_pushdown(PlannerInfo *root)
* Now that we know that grouping can be pushed down, search for the
* maximum sortgroupref. The base relations may need it if extra grouping
* expressions get added to them.
+	 *
+	 * XXX Shouldn't we do that only when adding extra grouping expressions?
*/
Assert(root->max_sortgroupref == 0);
foreach(lc, root->processed_tlist)

We don't know at this (early) stage whether those "extra grouping expression"
will be needed for at least one relation. (max_sortgroupref is used by
create_rel_agg_info())

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0ada3ba3ebe..2f4db69c1f9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3899,6 +3899,10 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/*
* The non-partial paths can come either from the Gather above or from
* aggregate push-down.
+	 *
+	 * XXX I can't quite convince myself this is correct. How come it's fine
+	 * to check pathlist and then call set_cheapest() on partially_grouped_rel?
+	 * Maybe it's correct and the comment merely needs to explain this.

It's not clear to me what makes you confused. Without my patch, the code looks
like this:

if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
{
gather_grouping_paths(root, partially_grouped_rel);
set_cheapest(partially_grouped_rel);
}

Here gather_grouping_paths() adds paths to partially_grouped_rel->pathlist. My
patch calls set_cheapest() independent from gather_grouping_paths() because
the paths requiring the aggregate finalization can also be generated by the
aggregate push-down feature.

*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
set_cheapest(partially_grouped_rel);
@@ -6847,6 +6851,12 @@ create_partial_grouping_paths(PlannerInfo *root,
* push-down.
*/
partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+
+	/*
+	 * If the relation already exists, it must have been created by aggregate
+	 * pushdown. We can't check how exactly it got created, but we can at least
+	 * check that aggregate pushdown is enabled.
+	 */
Assert(enable_agg_pushdown || partially_grouped_rel == NULL);

ok, done.

@@ -6872,6 +6882,8 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+	 *
+	 * XXX Not sure why we're checking the partially_grouped_rel here?
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&

I think (but not verified yet) that without this test the function could
return NULL for reasons unrelated to the aggregate push-down. Nevertheless, I
realize now that there's no aggregate push-down specific processing in the
function. I've adjusted it so that it does return, but the returned value is
partially_grouped_rel rather than NULL.

@@ -6881,7 +6893,9 @@ create_partial_grouping_paths(PlannerInfo *root,

/*
* Build a new upper relation to represent the result of partially
-	 * aggregating the rows from the input relation.
+	 * aggregating the rows from the input relation. The relation may
+	 * already exist due to aggregate pushdown, in which case we don't
+	 * need to create it.
*/
if (partially_grouped_rel == NULL)
partially_grouped_rel = fetch_upper_rel(root,

ok, done.

@@ -6903,6 +6917,8 @@ create_partial_grouping_paths(PlannerInfo *root,
*
* If the target was already created for the sake of aggregate push-down,
* it should be compatible with what we'd create here.
+	 *
+	 * XXX Why is this checking reltarget->exprs? What does that mean? 
*/
if (partially_grouped_rel->reltarget->exprs == NIL)
partially_grouped_rel->reltarget =

I've added this comment:

* XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
* push-down generated no paths), it created an empty target. Should we
* change the convention and have it assign NULL to reltarget instead? Or
* should we introduce a function like is_pathtarget_empty()?

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 7025ebf94be..395bd093d34 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3163,9 +3163,21 @@ create_agg_path(PlannerInfo *root,
}

/*
+ * create_agg_sorted_path
+ *		Creates a pathnode performing sorted aggregation/grouping
+ *
* Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
*
* NULL is returned if sorting of subpath output is not suitable.
+ *
+ * XXX I'm a bit confused why we need this? We now have create_agg_path and also
+ * create_agg_sorted_path and create_agg_hashed_path.

Do you mean that the function names are confusing? The functions
create_agg_sorted_path() and create_agg_hashed_path() do some checks /
preparation for the call of the existing function create_agg_path(), which is
more low-level. Should the names be something like
create_partial_agg_sorted_path() and create_partial_agg_hashed_path() ?

+ *
+ * XXX This assumes the input path to be sorted in a suitable way, but for
+ * regular aggregation we check that separately and then perhaps add sort
+ * if needed (possibly incremental one). That is, we don't do such checks
+ * in create_agg_path. Shouldn't we do the same thing before calling this
+ * new functions?
*/
AggPath *
create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
@@ -3184,6 +3196,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
agg_exprs = agg_info->agg_exprs;
target = agg_info->target;

Likewise, it seems that you'd like to see different function name and maybe
different location of this function. Both create_agg_sorted_path() and
create_agg_hashed_path() are rather wrappers for create_agg_path().

+ /* Bail out if the input path is not sorted at all. */
if (subpath->pathkeys == NIL)
return NULL;

ok, done.

@@ -3192,6 +3205,18 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,

/*
* Find all query pathkeys that our relation does affect.
+	 *
+	 * XXX Not sure what "that our relation does affect" means? Also, we
+	 * are not looking at query_pathkeys but group_pathkeys, so that's a
+	 * bit confusing. Perhaps something like this would be better:
+	 *

Indeed, the check of pathkeys was weird, I've reworked it.

@@ -3210,10 +3235,21 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
}
}

+ /* Bail out if the subquery has no pathkeys for the grouping. */
if (key_subset == NIL)
return NULL;
-	/* Check if AGG_SORTED is useful for the whole query.  */
+	/*
+	 * Check if AGG_SORTED is useful for the whole query.
+	 *
+	 * XXX So this means we require the group pathkeys matched to the
+	 * subpath have to be a prefix of subpath->pathkeys. Why is that
+	 * necessary? We'll reduce the cardinality, and in the worst case
+	 * we'll have to add a separate sort (full or incremental). Or we
+	 * could finalize using hashed aggregate.

Although with different arguments, pathkeys_contained_in() is still used in
the new version of the patch. I've added a TODO comment about the incremental
sort (it did not exist when I was writing the patch), but what do you mean by
"reducing the cardinality"? Eventually the partial aggregate should reduce the
cardinality, but for the AGG_SORT strategy to work, the input sorting must be
such that the executor can recognize the group boundaries.

+	 *
+	 * XXX Doesn't seem to change any regression tests when disabled.
+	 */
if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
return NULL;

"disabled" means removal of this part (including the return statement), or
returning NULL unconditionally? Whatever you mean, please check with the new
version.

@@ -3231,7 +3267,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
result = create_agg_path(root, rel, subpath, target,
AGG_SORTED, aggsplit,
agg_info->group_clauses,
-							 NIL,
+							 NIL,	/* qual for HAVING clause */
&agg_costs,
dNumGroups);

ok, done here as well as in create_agg_hashed_path().

@@ -3283,6 +3319,9 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
&agg_costs,
dNumGroups);
+		/*
+		 * XXX But we can spill to disk in hashagg now, no?
+		 */
if (hashaggtablesize < work_mem * 1024L)
{

Yes, we can. It wasn't possible while I was writing the patch. Fixed.

diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351e..6e87ada684b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -388,6 +388,7 @@
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
+#enable_agg_pushdown = on

Done.

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1055ea70940..05192ca549a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3352,7 +3352,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
RelOptInfo *rel_plain, RelAggInfo *agg_info)
{
ListCell   *lc;
-	Path	   *path;
+	Path	   *path;	/* XXX why declare at this level, not in the loops */

I usually do it this way, not sure why. Perhaps because it's less typing :-) I
changed that in the next version so that we don't waste time arguing about
unimportant things.

[1]: /messages/by-id/9726.1542577439@sss.pgh.pa.us

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

vignesh C

vignesh21@gmail.com

over 3 years ago

In reply to: Antonin Houska (#6)

Re: WIP: Aggregation push-down - take2

On Thu, 17 Nov 2022 at 16:34, Antonin Houska <ah@cybertec.at> wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

Hi,

I did a quick initial review of the v20 patch series. I plan to do a
more thorough review over the next couple days, if time permits. In
general I think the patch is in pretty good shape.

Thanks.

I've added a bunch of comments in a number of places - see the "review
comments" parts for each of the original parts. That should make it
easier to deal with all the items. I'll go through the main stuff here:

Unless I miss something, all these items are covered in context below, except
for this one:

7) when I change enable_agg_pushdown to true and run regression tests, I
get a bunch of failures like

ERROR: WindowFunc found where not expected

Seems we don't handle window functions correctly somewhere, or maybe
setup_aggregate_pushdown should check/reject hasWindowFuncs too?

We don't need to reject window functions, window functions are processed after
grouping/aggregation. The problem I noticed in the regression tests was that a
window function referenced a (non-window) aggregate. We just need to ensure
that pull_var_clause() recurses into that window function in such cases:

Besides the next version, v21-fixes.patch file is attached. It tries to
summarize all the changes between v21 and v22. (I wonder if this attachment
makes the cfbot fail.)
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 8e913c92d8..8dc39765f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -355,7 +355,8 @@ create_aggregate_grouped_var_infos(PlannerInfo *root)
Assert(root->grouped_var_list == NIL);
tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
-                                                                 PVC_INCLUDE_AGGREGATES);
+                                                                 PVC_INCLUDE_AGGREGATES |
+                                                                 PVC_RECURSE_WINDOWFUNCS);
/*
* Although GroupingFunc is related to root->parse->groupingSets, this
---
src/backend/optimizer/util/relnode.c | 11 +++++++++++
src/include/nodes/pathnodes.h | 3 +++
2 files changed, 14 insertions(+)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 94720865f47..d4367ba14a5 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -382,6 +382,12 @@ find_base_rel(PlannerInfo *root, int relid)
/*
* build_rel_hash
*     Construct the auxiliary hash table for relation specific data.
+ *
+ * XXX Why is this renamed, leaving out the "join" part? Are we going to use
+ * it for other purposes?
Yes, besides join relation, it's used to find the "grouped relation" by
Relids. This change tries to follow the suggestion "Maybe an appropriate
preliminary patch ..." in [1], but I haven't got any feedback whether my
understanding was correct.
+ * XXX Also, why change the API and not pass PlannerInfo? Seems pretty usual
+ * for planner functions.
I think that the reason was that, with the patch applied, PlannerInfo contains
multiple fields of the RelInfoList type, so build_rel_hash() needs an
information which one it should process. Passing the exact field is simpler
than passing PlannerInfo plus some additional information.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -422,6 +428,11 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
*     Find a base or join relation entry.
+ *
+ * XXX Why change the API and not pass PlannerInfo? Seems pretty usual
+ * for planner functions.
For the same reason that build_rel_hash() receives the list explicitly, see
above.

+ * XXX I don't understand why we need both this and find_join_rel.

Perhaps I just wanted to keep the call sites of find_join_rel() untouched. I
think that

find_join_rel(root, relids);

is a little bit easier to read than

(RelOptInfo *) find_rel_info(root->join_rel_list, relids);
*/
static void *
find_rel_info(RelInfoList *list, Relids relids)
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 0ca7d5ab51e..018ce755720 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -88,6 +88,9 @@ typedef enum UpperRelationKind
* present and valid when rel_hash is not NULL.  Note that we still maintain
* the list even when using the hash table for lookups; this simplifies life
* for GEQO.
+ *
+ * XXX I wonder why we actually need a separate node, merely wrapping fields
+ * that already existed ...
This is so that the existing fields can still be printed out
(nodes/outfuncs.c).
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2fd1a962699..6f6b7d0b93b 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1168,6 +1168,12 @@ input of Agg node. However, if the groups are large enough, it may be more
efficient to apply the partial aggregation to the output of base relation
scan, and finalize it when we have all relations of the query joined:
+XXX review: Hmm, do we need to push it all the way down to base relations? Or
+would it make sense to do the agg on an intermediate level? Say, we're joining
+three tables A, B and C. Maybe the agg could/should be evaluated on top of join
+A+B, before joining with C? Say, maybe the aggregate references columns from
+both base relations?
+
EXPLAIN
SELECT a.i, avg(b.y)
FROM a JOIN b ON b.j = a.i
Another example below does show the partial aggregates at join level.
+XXX Perhaps mention this may also mean the partial ggregate could be pushed
+to a remote server with FDW partitions?
Even if it's not implemented in the current patch version?
+
Note that there's often no GROUP BY expression to be used for the partial
aggregation, so we use equivalence classes to derive grouping expression: in
the example above, the grouping key "b.j" was derived from "a.i".
+XXX I think this is slightly confusing - there is a GROUP BY expression for the
+partial aggregate, but as stated in the query it may not reference the side of
+a join explicitly.
ok, changed.
Also note that in this case the partial aggregate uses the "b.j" as grouping
column although the column does not appear in the query target list. The point
is that "b.j" is needed to evaluate the join condition, and there's no other
way for the partial aggregate to emit its values.
+XXX Not sure I understand what this is trying to say. Firstly, maybe it'd be
+helpful to show targetlists in the EXPLAIN, i.e. do it as VERBOSE. But more
+importantly, isn't this a direct consequence of the equivalence classes stuff
+mentioned in the preceding paragraph?
The equivalence class is just a mechanism to derive expressions which are not
explicitly mentioned in the query, but there's always a question whether you
need to derive any expression for particular table or not. Here I tried to
explain that the choice of join columns is related to the choice of grouping
keys for the partial aggregate.

I've deleted this paragraph and added a note to the previous one.
Besides base relation, the aggregation can also be pushed down to join:
EXPLAIN
@@ -1217,6 +1235,10 @@ Besides base relation, the aggregation can also be pushed down to join:
->  Hash
->  Seq Scan on a
+XXX Aha, so this is pretty-much an answer to my earlier comment, and matches
+my example with three tables. Maybe this suggests the initial reference to
+base relations is a bit confusing.
I tried to use the simplest example to demonstrate the concepts, then extended
it to the partially-aggregated joins.
+XXX I think this is a good explanation of the motivation for this patch, but
+maybe it'd be good to go into more details about how we decide if it's correct
+to actually do the pushdown, data structures etc. Similar to earlier parts of
+this README.
Added two paragraphs, see "Regarding correctness...".
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f00f900ff41..6d2c2f4fc36 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -196,9 +196,10 @@ make_one_rel(PlannerInfo *root, List *joinlist)
/*
* Now that the sizes are known, we can estimate the sizes of the grouped
* relations.
+      *
+      * XXX Seems more consistent with code nearby.
*/
-     if (root->grouped_var_list)
-             setup_base_grouped_rels(root);
+     setup_base_grouped_rels(root);
In general I prefer not calling a function if it's obvious that it's not
needed, but on the other hand the test of the 'grouped_var_list' field may be
considered disturbing from the caller's perspective. I've got no strong
opinion on this, so I can accept this proposal.

/*
- * setup_based_grouped_rels
+ * setup_base_grouped_rels
* For each "plain" relation build a grouped relation if aggregate pushdown
* is possible and if this relation is suitable for partial aggregation.
*/

Fixed, thanks.
{
Index rti;
+     /* If there are no grouped relations, estimate their sizes. */
+     if (!root->grouped_var_list)
+             return;
+
Accepted, but with different wording (s/relations/expressions/).
+             /* XXX Shouldn't this check be earlier? Seems cheaper than the check
+              * calling bms_nonempty_difference, for example. */
if (brel->reloptkind != RELOPT_BASEREL)
continue;
Right, moved.
rel_grouped = build_simple_grouped_rel(root, brel->relid, &agg_info);
-             if (rel_grouped)
-             {
-                     /* Make the relation available for joining. */
-                     add_grouped_rel(root, rel_grouped, agg_info);
-             }
+
+             /* XXX When does this happen? */
+             if (!rel_grouped)
+                     continue;
+
+             /* Make the relation available for joining. */
+             add_grouped_rel(root, rel_grouped, agg_info);
I'd use the "continue" statement if there was a lot of code in the "if
(rel_grouped) {...}" branch, but no strong preference in this case, so
accepted.
}
}

@@ -560,6 +569,8 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Plain relation */
set_plain_rel_pathlist(root, rel, rte);
+                                     /* XXX Shouldn't this really be part of set_plain_rel_pathlist? */
+
/* Add paths to the grouped relation if one exists. */
rel_grouped = find_grouped_rel(root, rel->relids,
Yes, it can. Moved.
@@ -3382,6 +3393,11 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
/*
* Apply partial aggregation to a subpath and add the AggPath to the pathlist.
+ *
+ * XXX I think this is potentially quite confusing, because the existing "add"
+ * functions add_path and add_partial_path only check if the proposed path is
+ * dominated by an existing path, pathkeys, etc. But this does more than that,
+ * perhaps even constructing new path etc.
*/
static void
add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
Maybe, but I don't have a good idea of an alternative name.
create_group_path() already exists and the create_*_path() functions are
rather low-level. Maybe generate_grouped_path(), and at the same time rename
generate_grouping_paths() to generate_grouped_paths()? In general, the
generate_*_path*() functions do non-trivial things and eventually call
add_path().
@@ -3399,9 +3414,16 @@ add_grouped_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
else
elog(ERROR, "unexpected strategy %d", aggstrategy);
+     /*
+      * Bail out if we failed to create a suitable aggregated path. This can
+      * happen e.g. then the path does not support hashing (for AGG_HASHED),
+      * or when the input path is not sorted.
+      */
+     if (agg_path == NULL)
+             return;
+
/* Add the grouped path to the list of grouped base paths. */
-     if (agg_path != NULL)
-             add_path(rel, (Path *) agg_path);
+     add_path(rel, (Path *) agg_path);
ok, changed.
}

/*
@@ -3545,7 +3567,6 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)

for (lev = 2; lev <= levels_needed; lev++)
{
- RelOptInfo *rel_grouped;
ListCell *lc;
/*
@@ -3567,6 +3588,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*/
foreach(lc, root->join_rel_level[lev])
{
+                     RelOptInfo *rel_grouped;
+
rel = (RelOptInfo *) lfirst(lc);
Sure, fixed.
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 8e913c92d8b..d7a9de9645e 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -278,6 +278,8 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
* each possible grouping expression.
*
* root->group_pathkeys must be setup before this function is called.
+ *
+ * XXX Perhaps this should check/reject hasWindowFuncs too?
create_window_paths() is called after create_grouping_paths() (see
grouping_planner()), so it should not care whether the input (possibly
grouped) paths involve the aggregate push-down or not.
*/
extern void
setup_aggregate_pushdown(PlannerInfo *root)
@@ -311,6 +313,12 @@ setup_aggregate_pushdown(PlannerInfo *root)
if (root->parse->hasTargetSRFs)
return;
+     /*
+      * XXX Maybe it'd be better to move create_aggregate_grouped_var_infos and
+      * create_grouping_expr_grouped_var_infos to a function returning bool, and
+      * only check that here.
+      */
+
Hm, it looks to me like too much "indirection", and also a decriptive function
name would be tricky to invent.
/* Create GroupedVarInfo per (distinct) aggregate. */
create_aggregate_grouped_var_infos(root);
@@ -329,6 +337,8 @@ setup_aggregate_pushdown(PlannerInfo *root)
* Now that we know that grouping can be pushed down, search for the
* maximum sortgroupref. The base relations may need it if extra grouping
* expressions get added to them.
+      *
+      * XXX Shouldn't we do that only when adding extra grouping expressions?
*/
Assert(root->max_sortgroupref == 0);
foreach(lc, root->processed_tlist)
We don't know at this (early) stage whether those "extra grouping expression"
will be needed for at least one relation. (max_sortgroupref is used by
create_rel_agg_info())
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 0ada3ba3ebe..2f4db69c1f9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3899,6 +3899,10 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/*
* The non-partial paths can come either from the Gather above or from
* aggregate push-down.
+      *
+      * XXX I can't quite convince myself this is correct. How come it's fine
+      * to check pathlist and then call set_cheapest() on partially_grouped_rel?
+      * Maybe it's correct and the comment merely needs to explain this.
It's not clear to me what makes you confused. Without my patch, the code looks
like this:

if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
{
gather_grouping_paths(root, partially_grouped_rel);
set_cheapest(partially_grouped_rel);
}

Here gather_grouping_paths() adds paths to partially_grouped_rel->pathlist. My
patch calls set_cheapest() independent from gather_grouping_paths() because
the paths requiring the aggregate finalization can also be generated by the
aggregate push-down feature.
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
set_cheapest(partially_grouped_rel);
@@ -6847,6 +6851,12 @@ create_partial_grouping_paths(PlannerInfo *root,
* push-down.
*/
partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+
+     /*
+      * If the relation already exists, it must have been created by aggregate
+      * pushdown. We can't check how exactly it got created, but we can at least
+      * check that aggregate pushdown is enabled.
+      */
Assert(enable_agg_pushdown || partially_grouped_rel == NULL);
ok, done.
@@ -6872,6 +6882,8 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+      *
+      * XXX Not sure why we're checking the partially_grouped_rel here?
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
I think (but not verified yet) that without this test the function could
return NULL for reasons unrelated to the aggregate push-down. Nevertheless, I
realize now that there's no aggregate push-down specific processing in the
function. I've adjusted it so that it does return, but the returned value is
partially_grouped_rel rather than NULL.
@@ -6881,7 +6893,9 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* Build a new upper relation to represent the result of partially
-      * aggregating the rows from the input relation.
+      * aggregating the rows from the input relation. The relation may
+      * already exist due to aggregate pushdown, in which case we don't
+      * need to create it.
*/
if (partially_grouped_rel == NULL)
partially_grouped_rel = fetch_upper_rel(root,
ok, done.
@@ -6903,6 +6917,8 @@ create_partial_grouping_paths(PlannerInfo *root,
*
* If the target was already created for the sake of aggregate push-down,
* it should be compatible with what we'd create here.
+      *
+      * XXX Why is this checking reltarget->exprs? What does that mean?
*/
if (partially_grouped_rel->reltarget->exprs == NIL)
partially_grouped_rel->reltarget =
I've added this comment:

* XXX If fetch_upper_rel() had to create a new relation (i.e. aggregate
* push-down generated no paths), it created an empty target. Should we
* change the convention and have it assign NULL to reltarget instead? Or
* should we introduce a function like is_pathtarget_empty()?
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 7025ebf94be..395bd093d34 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3163,9 +3163,21 @@ create_agg_path(PlannerInfo *root,
}
/*
+ * create_agg_sorted_path
+ *           Creates a pathnode performing sorted aggregation/grouping
+ *
* Apply AGG_SORTED aggregation path to subpath if it's suitably sorted.
*
* NULL is returned if sorting of subpath output is not suitable.
+ *
+ * XXX I'm a bit confused why we need this? We now have create_agg_path and also
+ * create_agg_sorted_path and create_agg_hashed_path.
Do you mean that the function names are confusing? The functions
create_agg_sorted_path() and create_agg_hashed_path() do some checks /
preparation for the call of the existing function create_agg_path(), which is
more low-level. Should the names be something like
create_partial_agg_sorted_path() and create_partial_agg_hashed_path() ?
+ *
+ * XXX This assumes the input path to be sorted in a suitable way, but for
+ * regular aggregation we check that separately and then perhaps add sort
+ * if needed (possibly incremental one). That is, we don't do such checks
+ * in create_agg_path. Shouldn't we do the same thing before calling this
+ * new functions?
*/
AggPath *
create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
@@ -3184,6 +3196,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
agg_exprs = agg_info->agg_exprs;
target = agg_info->target;
Likewise, it seems that you'd like to see different function name and maybe
different location of this function. Both create_agg_sorted_path() and
create_agg_hashed_path() are rather wrappers for create_agg_path().

+ /* Bail out if the input path is not sorted at all. */
if (subpath->pathkeys == NIL)
return NULL;

ok, done.
@@ -3192,6 +3205,18 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
/*
* Find all query pathkeys that our relation does affect.
+      *
+      * XXX Not sure what "that our relation does affect" means? Also, we
+      * are not looking at query_pathkeys but group_pathkeys, so that's a
+      * bit confusing. Perhaps something like this would be better:
+      *
Indeed, the check of pathkeys was weird, I've reworked it.
@@ -3210,10 +3235,21 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
}
}

+ /* Bail out if the subquery has no pathkeys for the grouping. */
if (key_subset == NIL)
return NULL;
-     /* Check if AGG_SORTED is useful for the whole query.  */
+     /*
+      * Check if AGG_SORTED is useful for the whole query.
+      *
+      * XXX So this means we require the group pathkeys matched to the
+      * subpath have to be a prefix of subpath->pathkeys. Why is that
+      * necessary? We'll reduce the cardinality, and in the worst case
+      * we'll have to add a separate sort (full or incremental). Or we
+      * could finalize using hashed aggregate.
Although with different arguments, pathkeys_contained_in() is still used in
the new version of the patch. I've added a TODO comment about the incremental
sort (it did not exist when I was writing the patch), but what do you mean by
"reducing the cardinality"? Eventually the partial aggregate should reduce the
cardinality, but for the AGG_SORT strategy to work, the input sorting must be
such that the executor can recognize the group boundaries.
+      *
+      * XXX Doesn't seem to change any regression tests when disabled.
+      */
if (!pathkeys_contained_in(key_subset, subpath->pathkeys))
return NULL;
"disabled" means removal of this part (including the return statement), or
returning NULL unconditionally? Whatever you mean, please check with the new
version.
@@ -3231,7 +3267,7 @@ create_agg_sorted_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
result = create_agg_path(root, rel, subpath, target,
AGG_SORTED, aggsplit,
agg_info->group_clauses,
-                                                      NIL,
+                                                      NIL,   /* qual for HAVING clause */
&agg_costs,
dNumGroups);
ok, done here as well as in create_agg_hashed_path().
@@ -3283,6 +3319,9 @@ create_agg_hashed_path(PlannerInfo *root, RelOptInfo *rel,
&agg_costs,
dNumGroups);
+             /*
+              * XXX But we can spill to disk in hashagg now, no?
+              */
if (hashaggtablesize < work_mem * 1024L)
{
Yes, we can. It wasn't possible while I was writing the patch. Fixed.
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 868d21c351e..6e87ada684b 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -388,6 +388,7 @@
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
+#enable_agg_pushdown = on
Done.
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1055ea70940..05192ca549a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3352,7 +3352,7 @@ generate_grouping_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
RelOptInfo *rel_plain, RelAggInfo *agg_info)
{
ListCell   *lc;
-     Path       *path;
+     Path       *path;       /* XXX why declare at this level, not in the loops */
I usually do it this way, not sure why. Perhaps because it's less typing :-) I
changed that in the next version so that we don't waste time arguing about
unimportant things.

The patch does not apply on top of HEAD as in [1]http://cfbot.cputube.org/patch_41_3764.log, please post a rebased patch:
=== Applying patches on top of PostgreSQL commit ID
5212d447fa53518458cbe609092b347803a667c5 ===
=== applying patch ./v21-fixes.patch
patching file src/backend/optimizer/README
Hunk #1 FAILED at 1186.
1 out of 1 hunk FAILED -- saving rejects to file
src/backend/optimizer/README.rej
patching file src/backend/optimizer/path/allpaths.c
Hunk #1 FAILED at 197.
Hunk #2 FAILED at 341.
Hunk #3 succeeded at 339 with fuzz 1 (offset -11 lines).
Hunk #4 succeeded at 1014 with fuzz 2 (offset 647 lines).
Hunk #5 FAILED at 378.
Hunk #6 FAILED at 563.
Hunk #7 succeeded at 2793 with fuzz 1 (offset 1948 lines).
Hunk #8 FAILED at 867.
Hunk #9 FAILED at 3439.
Hunk #10 FAILED at 3590.
Hunk #11 succeeded at 3430 (offset -182 lines).
7 out of 11 hunks FAILED -- saving rejects to file
src/backend/optimizer/path/allpaths.c.rej
patching file src/backend/optimizer/path/costsize.c

[1]: http://cfbot.cputube.org/patch_41_3764.log

Regards,
Vignesh

Antonin Houska

ah@cybertec.at

over 3 years ago

In reply to: vignesh C (#7)

Re: WIP: Aggregation push-down - take2

vignesh C <vignesh21@gmail.com> wrote:

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

[1] - http://cfbot.cputube.org/patch_41_3764.log

This is the next version (only rebased, no other changes).

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Gregory Stark (as CFM)

stark.cfm@gmail.com

about 3 years ago

In reply to: Antonin Houska (#8)

Re: WIP: Aggregation push-down - take2

On Thu, 5 Jan 2023 at 02:59, Antonin Houska <ah@cybertec.at> wrote:

vignesh C <vignesh21@gmail.com> wrote:

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

And again...

Setting this to Waiting on Author for the moment.

Do you think this patch is likely to be ready for this release or the
next one? Is there specific feedback you're looking for?

patching file src/backend/optimizer/util/relnode.c
Hunk #1 FAILED at 18.
Hunk #2 succeeded at 85 (offset 8 lines).
Hunk #3 succeeded at 405 with fuzz 1 (offset 25 lines).
Hunk #4 succeeded at 595 (offset 63 lines).
Hunk #5 succeeded at 657 (offset 63 lines).
Hunk #6 succeeded at 692 (offset 63 lines).
Hunk #7 succeeded at 731 (offset 63 lines).
Hunk #8 succeeded at 849 (offset 62 lines).
Hunk #9 succeeded at 860 (offset 62 lines).
Hunk #10 succeeded at 873 (offset 62 lines).
Hunk #11 FAILED at 911.
Hunk #12 FAILED at 945.
Hunk #13 succeeded at 2585 (offset 310 lines).
3 out of 13 hunks FAILED -- saving rejects to file
src/backend/optimizer/util/relnode.c.rej
patching file src/backend/optimizer/util/tlist.c
patching file src/backend/utils/misc/guc_tables.c
Hunk #1 succeeded at 946 (offset 1 line).
patching file src/backend/utils/misc/postgresql.conf.sample
Hunk #1 succeeded at 390 (offset 2 lines).
patching file src/include/nodes/pathnodes.h
Hunk #1 succeeded at 386 (offset 10 lines).
Hunk #2 succeeded at 429 (offset 10 lines).
Hunk #3 succeeded at 477 (offset 38 lines).
Hunk #4 succeeded at 1084 (offset 37 lines).
Hunk #5 succeeded at 3117 (offset 146 lines).
patching file src/include/optimizer/clauses.h
patching file src/include/optimizer/pathnode.h
Hunk #2 FAILED at 311.
Hunk #3 FAILED at 344.
2 out of 3 hunks FAILED -- saving rejects to file
src/include/optimizer/pathnode.h.rej

--
Gregory Stark
As Commitfest Manager

#10

Gregory Stark (as CFM)

stark.cfm@gmail.com

about 3 years ago

In reply to: Gregory Stark (as CFM) (#9)

Re: WIP: Aggregation push-down - take2

It looks like in November 2022 Tomas Vondra said:

I did a quick initial review of the v20 patch series.
I plan to do a

more thorough review over the next couple days, if time permits.

In

general I think the patch is in pretty good shape.

Following which Antonin Houska updated the patch responding to his
review comments.

Since then this patch has demonstrated the unfortunate "please rebase
thx" followed by the author rebasing and getting no feedback until
"please rebase again thx"...

So while the patch doesn't currently apply it seems like it really
should be either Needs Review or Ready for Commit.

That said, I suspect this patch has missed the boat for this CF.
Hopefully it will get more attention next release.

I'll move it to the next CF but set it to Needs Review even though it
needs a rebase.

--
Gregory Stark
As Commitfest Manager

#11

Peter Smith

smithpb2250@gmail.com

over 2 years ago

In reply to: Gregory Stark (as CFM) (#10)

Re: WIP: Aggregation push-down - take2

2024-01 Commitfest.

Hi, this patch was marked in CF as "Needs Review", but there has been
no activity on this thread for 9+ months.

Since there seems not much interest, I have changed the status to
"Returned with Feedback" [1]https://commitfest.postgresql.org/46/3764/. Feel free to propose a stronger use case
for the patch and add an entry for the same.

======
[1]: https://commitfest.postgresql.org/46/3764/

Kind Regards,
Peter Smith.

WIP: Aggregation push-down - take2

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: