Expand applicability of aggregate's sortop optimization

Started by Matthias van de Meentover 1 year ago26 messages

boekewurm+postgres@gmail.com

over 1 year ago

1 attachment(s)

Hi,

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY <anything>), because
<anything> can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.

One of those cases is fixed in the attached patch: if we order by the
same column that we're aggregating, using the same order class as the
aggregate's sort operator (i.e. the aggregate's sortop is in the same
btree opclass as the ORDER BY's sort operator), then we can still use
the index operation: The sort behaviour isn't changed, thus we can
apply the optimization.

PFA the small patch that implements this.

Note that we can't blindly accept just any ordering by the same
column: If we had an opclass that sorted numeric values by the length
of the significant/mantissa, then that'd provide a different (and
distinct) ordering from that which is expected by the normal
min()/max() aggregates for numeric, which could cause us to return
arguably incorrect results for the aggregate expression.

Alternatively, the current code could be changed to build indexed
paths that append the SORT BY paths to the aggregate's sort operator,
but that'd significantly increase complexity here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachments:

v1-0001-Plan-min-max-like-aggregates-with-index-accesses-.patchapplication/x-patch; name=v1-0001-Plan-min-max-like-aggregates-with-index-accesses-.patchDownload

From 215600d12164f214aae8f0de94b16373bd202d69 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Thu, 25 Apr 2024 15:23:57 +0200
Subject: [PATCH v1] Plan min()/max()-like aggregates with index accesses in
 more cases

When the aggregate's operator is included in the ORDER BY's ordering
opclass, we know they have a common ordering, and thus should get the
same output (or at least same consistency guarantee for that output).

So, check if the ORDER BY orders things by only the aggregated
expression, and check if that ordering shares a btree opclass with
the aggregator's operator. If so, we can use the index.
---
 src/backend/optimizer/plan/planagg.c     | 87 ++++++++++++++++++++----
 src/test/regress/expected/aggregates.out | 65 ++++++++++++++++++
 src/test/regress/sql/aggregates.sql      | 18 +++++
 3 files changed, 156 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..d8479fe286 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,16 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		if (list_length(aggref->args) != 1)
 			return false;		/* it couldn't be MIN/MAX */
 
+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);
+
 		/*
 		 * ORDER BY is usually irrelevant for MIN/MAX, but it can change the
 		 * outcome if the aggsortop's operator class recognizes non-identical
@@ -267,22 +277,71 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		 * quickly.
 		 */
 		if (aggref->aggorder != NIL)
-			return false;
+		{
+			SortGroupClause *orderClause;
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			if (list_length(aggref->aggorder) > 1)
+				return false;
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+				return false;
+
+			aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+			if (!OidIsValid(aggsortop))
+				return false;		/* not a MIN/MAX aggregate */
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List   *btclasses;
+				ListCell *cell;
+				bool	match = false;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(cell, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+					interpretation = (OpBtreeInterpretation *) lfirst(cell);
+
+					if (!match)
+					{
+						if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+							match = true;
+					}
+					/* Now useless */
+					pfree(interpretation);
+				}
+
+				/* Now not useful anymore */
+				pfree(btclasses);
+
+				if (!match)
+					return false;
+			}
+		}
+		else
+		{
+			aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+			if (!OidIsValid(aggsortop))
+				return false;		/* not a MIN/MAX aggregate */
+		}
 		/* note: we do not care if DISTINCT is mentioned ... */
 
-		/*
-		 * We might implement the optimization when a FILTER clause is present
-		 * by adding the filter to the quals of the generated subquery.  For
-		 * now, just punt.
-		 */
-		if (aggref->aggfilter != NULL)
-			return false;
-
-		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
-		if (!OidIsValid(aggsortop))
-			return false;		/* not a MIN/MAX aggregate */
-
-		curTarget = (TargetEntry *) linitial(aggref->args);
 
 		if (contain_mutable_functions((Node *) curTarget->expr))
 			return false;		/* not potentially indexable */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 56c361ccef..904a96eb94 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -978,6 +978,71 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index d28338ba3d..3a595a9fd7 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -362,6 +362,24 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
-- 
2.40.1

Dagfinn Ilmari Mannsåker

ilmari@ilmari.org

over 1 year ago

In reply to: Matthias van de Meent (#1)

Re: Expand applicability of aggregate's sortop optimization

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

PFA the small patch that implements this.

I don't have enough knowledge to have an opinion on most of the patch
other than it looks okay at a glance, but the list API usage could be
updated to more modern variants:

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..d8479fe286 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,16 @@ can_minmax_aggs(PlannerInfo *root, List **context)
if (list_length(aggref->args) != 1)
return false;		/* it couldn't be MIN/MAX */

+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);

This could be linitial_node(TargetEntry, aggref->args).

+			if (list_length(aggref->aggorder) > 1)
+				return false;
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));

This could be linitial_node(SortGroupClause, aggref->aggorder).

+			if (orderClause->sortop != aggsortop)
+			{
+				List   *btclasses;
+				ListCell *cell;
+				bool	match = false;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(cell, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+					interpretation = (OpBtreeInterpretation *) lfirst(cell);

This could be foreach_ptr(OpBtreeInterpretation, interpretation, btclasses),
which also eliminates the need for the explicit `ListCell *` variable
and lfirst() call.

- ilmari

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Matthias van de Meent (#1)

Re: Expand applicability of aggregate's sortop optimization

On Wed, 8 May 2024 at 22:13, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY <anything>), because
<anything> can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.

I wonder if we should also consider as an alternative to this to just
have an aggregate support function, similar to
SupportRequestOptimizeWindowClause that just nullifies the aggorder /
aggdistinct fields for Min/Max aggregates on types where there's no
possible difference in output when calling the transition function on
rows in a different order.

Would that apply in enough cases for you?

I think it would rule out Min(numeric) and Max(numeric). We were
careful not to affect the number of decimal places in the numeric
output when using the moving aggregate inverse transition
infrastructure for WindowFuncs, so I agree we should maintain an
ability to control the aggregate transition order for numeric. (See
do_numeric_discard's maxScale if check)

I don't think floating point types have the same issues here. At least
+1.0 is greater than -1.0.

Are there any strange collation rules that would cause issues if we
did this with the text types?

David

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: David Rowley (#3)

Re: Expand applicability of aggregate's sortop optimization

On Thu, 9 May 2024 at 12:26, David Rowley <dgrowleyml@gmail.com> wrote:

I wonder if we should also consider as an alternative to this to just
have an aggregate support function, similar to
SupportRequestOptimizeWindowClause that just nullifies the aggorder /
aggdistinct fields for Min/Max aggregates on types where there's no
possible difference in output when calling the transition function on
rows in a different order.

Would that apply in enough cases for you?

One additional thought is that the above method would also help
eliminate redundant sorting in queries with a GROUP BY clause.
Whereas, the can_minmax_aggs optimisation is not applied in that case.

David

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: David Rowley (#4)

Re: Expand applicability of aggregate's sortop optimization

On Thu, 9 May 2024 at 13:08, David Rowley <dgrowleyml@gmail.com> wrote:

One additional thought is that the above method would also help
eliminate redundant sorting in queries with a GROUP BY clause.
Whereas, the can_minmax_aggs optimisation is not applied in that case.

Another argument for using this method is that
SupportRequestOptimizeAggref could allow future unrelated
optimisations such as swapping count(<non-nullable-col>) for count(*).
Where <non-nullable-col> is a NOT NULL column and isn't nullable by
any outer join. Doing that could speed some queries up quite a bit as
it may mean fewer columns to deform from the tuple. You could imagine
a fact table with many columns and a few dimensions, often the
dimension columns that you'd expect to use in GROUP BY would appear
before the columns you'd aggregate on.

David

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Matthias van de Meent (#1)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 5/8/24 17:13, Matthias van de Meent wrote:

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY <anything>), because
<anything> can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.

Thanks for the job! I guess you could be more brave and push down also
FILTER statement.

One of those cases is fixed in the attached patch: if we order by the
same column that we're aggregating, using the same order class as the
aggregate's sort operator (i.e. the aggregate's sortop is in the same
btree opclass as the ORDER BY's sort operator), then we can still use
the index operation: The sort behaviour isn't changed, thus we can
apply the optimization.

PFA the small patch that implements this.

Note that we can't blindly accept just any ordering by the same
column: If we had an opclass that sorted numeric values by the length
of the significant/mantissa, then that'd provide a different (and
distinct) ordering from that which is expected by the normal
min()/max() aggregates for numeric, which could cause us to return
arguably incorrect results for the aggregate expression.

As I see, the code:
aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
return false; /* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.

Also, I'm unsure about the necessity of looking through the btree
classes. Maybe just to check the commutator to the sortop, like in the
diff attached? Or could you provide an example to support your approach?

--
regards, Andrei Lepikhov

Attachments:

rewritten_patch.difftext/x-patch; charset=UTF-8; name=rewritten_patch.diffDownload

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..e9e95e2b2a 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,20 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		if (list_length(aggref->args) != 1)
 			return false;		/* it couldn't be MIN/MAX */
 
+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			return false;		/* not a MIN/MAX aggregate */
+
 		/*
 		 * ORDER BY is usually irrelevant for MIN/MAX, but it can change the
 		 * outcome if the aggsortop's operator class recognizes non-identical
@@ -267,22 +281,37 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		 * quickly.
 		 */
 		if (aggref->aggorder != NIL)
-			return false;
-		/* note: we do not care if DISTINCT is mentioned ... */
-
-		/*
-		 * We might implement the optimization when a FILTER clause is present
-		 * by adding the filter to the quals of the generated subquery.  For
-		 * now, just punt.
-		 */
-		if (aggref->aggfilter != NULL)
-			return false;
+		{
+			SortGroupClause *orderClause;
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			if (list_length(aggref->aggorder) > 1)
+				return false;
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+				return false;
+
+			if (orderClause->sortop != aggsortop &&
+				orderClause->sortop != get_commutator(aggsortop))
+				return false;
+		}
 
-		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
-		if (!OidIsValid(aggsortop))
-			return false;		/* not a MIN/MAX aggregate */
+		/* note: we do not care if DISTINCT is mentioned ... */
 
-		curTarget = (TargetEntry *) linitial(aggref->args);
 
 		if (contain_mutable_functions((Node *) curTarget->expr))
 			return false;		/* not potentially indexable */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index a5596ab210..5d37510d64 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,71 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..5cdf336fc0 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,24 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: David Rowley (#4)

Re: Expand applicability of aggregate's sortop optimization

On 5/9/24 08:08, David Rowley wrote:

On Thu, 9 May 2024 at 12:26, David Rowley <dgrowleyml@gmail.com> wrote:

I wonder if we should also consider as an alternative to this to just
have an aggregate support function, similar to
SupportRequestOptimizeWindowClause that just nullifies the aggorder /
aggdistinct fields for Min/Max aggregates on types where there's no
possible difference in output when calling the transition function on
rows in a different order.

Would that apply in enough cases for you?

One additional thought is that the above method would also help
eliminate redundant sorting in queries with a GROUP BY clause.
Whereas, the can_minmax_aggs optimisation is not applied in that case.

I generally like the idea of a support function. But as I can see, the
can_minmax_aggs() rejects if any of the aggregates don't pass the
checks. The prosupport feature is designed to be applied to each
function separately. How do you think to avoid it?
Also, I don't clearly understand the case you mentioned here - does it
mean that you want to nullify orders for other aggregate types if they
are the same as the incoming order?

--
regards, Andrei Lepikhov

Matthias van de Meent

boekewurm+postgres@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#6)

Re: Expand applicability of aggregate's sortop optimization

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov <lepihov@gmail.com> wrote:

On 5/8/24 17:13, Matthias van de Meent wrote:

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY <anything>), because
<anything> can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.

Thanks for the job! I guess you could be more brave and push down also
FILTER statement.

While probably not impossible, I wasn't planning on changing this code
with new optimizations; just expanding the applicability of the
current optimizations.

Note that the "aggfilter" clause was not new, but moved up in the code
to make sure we use this local information to bail out (if applicable)
before trying to use the catalogs for bail-out information.

One of those cases is fixed in the attached patch: if we order by the
same column that we're aggregating, using the same order class as the
aggregate's sort operator (i.e. the aggregate's sortop is in the same
btree opclass as the ORDER BY's sort operator), then we can still use
the index operation: The sort behaviour isn't changed, thus we can
apply the optimization.

PFA the small patch that implements this.

Note that we can't blindly accept just any ordering by the same
column: If we had an opclass that sorted numeric values by the length
of the significant/mantissa, then that'd provide a different (and
distinct) ordering from that which is expected by the normal
min()/max() aggregates for numeric, which could cause us to return
arguably incorrect results for the aggregate expression.

As I see, the code:
aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
return false; /* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.

The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.

If the fetch_agg_sort_op clause would be deduplicated, it would either
increase code complexity to handle both aggref->aggorder paths, or it
would increase the cost of planning MAX(a ORDER BY b) because of the
newly added catalog access.

Also, I'm unsure about the necessity of looking through the btree
classes. Maybe just to check the commutator to the sortop, like in the
diff attached? Or could you provide an example to support your approach?

I think it could work, but I'd be hesitant to rely on that, as
commutator registration is optional (useful, but never required for
btree operator classes' operators). Looking at the btree operator
class, which is the definition of sortability in PostgreSQL, seems
more suitable and correct.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Matthias van de Meent (#8)

Re: Expand applicability of aggregate's sortop optimization

On 17/7/2024 16:33, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov <lepihov@gmail.com> wrote:

Thanks for the job! I guess you could be more brave and push down also
FILTER statement.

While probably not impossible, I wasn't planning on changing this code
with new optimizations; just expanding the applicability of the
current optimizations.

Got it>> As I see, the code:

aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
return false; /* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.

The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.

If the fetch_agg_sort_op clause would be deduplicated, it would either
increase code complexity to handle both aggref->aggorder paths, or it
would increase the cost of planning MAX(a ORDER BY b) because of the
newly added catalog access.

IMO it looks like a micro optimisation. But I agree, it is more about
code style - let the committer decide what is better.>> Also, I'm unsure
about the necessity of looking through the btree

classes. Maybe just to check the commutator to the sortop, like in the
diff attached? Or could you provide an example to support your approach?

I think it could work, but I'd be hesitant to rely on that, as
commutator registration is optional (useful, but never required for
btree operator classes' operators). Looking at the btree operator
class, which is the definition of sortability in PostgreSQL, seems
more suitable and correct.

Hm, I dubious about that. Can you provide an example which my variant
will not pass but your does that correctly?

--
regards, Andrei Lepikhov

#10

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#7)

Re: Expand applicability of aggregate's sortop optimization

On Wed, 17 Jul 2024 at 17:12, Andrei Lepikhov <lepihov@gmail.com> wrote:

I generally like the idea of a support function. But as I can see, the
can_minmax_aggs() rejects if any of the aggregates don't pass the
checks. The prosupport feature is designed to be applied to each
function separately. How do you think to avoid it?

You wouldn't avoid it. The prosupport function would be called once
for each Aggref in the query. Why is that a problem?

Also, I don't clearly understand the case you mentioned here - does it
mean that you want to nullify orders for other aggregate types if they
are the same as the incoming order?

No, I mean unconditionally nullify Aggref->aggorder and
Aggref->aggdistinct for aggregate functions where ORDER BY / DISTINCT
in the Aggref makes no difference to the result. I think that's ok for
max() and min() for everything besides NUMERIC. For aggorder, we'd
have to *not* optimise sum() and avg() for floating point types as
that could change the result. sum() and avg() for INT2, INT4 and INT8
seems fine. I'd need to check, but I think sum(numeric) is ok too as
the dscale should end up the same regardless of the order. Obviously,
aggdistinct can't be changed for sum() and avg() on any type.

It seems also possible to adjust count(non-nullable-var) into
count(*), which, if done early enough in planning could help
significantly by both reducing evaluation during execution, but also
possibly reduce tuple deformation if that Var has a higher varattno
than anything else in the relation. That would require checking
varnullingrels is empty and the respective RelOptInfo's notnullattnums
mentions the Var.

David

#11

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Matthias van de Meent (#8)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 17/7/2024 16:33, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov <lepihov@gmail.com> wrote:

As I see, the code:
aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
return false; /* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.

The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.

After additional research I think I get the key misunderstanding why you
did so:
As I see, the checks:
if (list_length(aggref->aggorder) > 1)
return false;
if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
return false;

not needed at all. You already have check:
if (list_length(aggref->args) != 1)
and this tells us, that if we have ordering like MIN(x ORDER BY <smth>),
this <smth> ordering contains only aggregate argument x. Because if it
contained some expression, the transformAggregateCall() would add this
expression to agg->args by calling the transformSortClause() routine.
The tleSortGroupRef is just exactly ressortgroupref - no need to recheck
it one more time. Of course, it is suitable only for MIN/MAX aggregates,
but we discuss only them right now. Am I wrong?
If you want, you can place it as assertions (see the diff in attachment).

--
regards, Andrei Lepikhov

Attachments:

rewritten-patch-v2.difftext/plain; charset=UTF-8; name=rewritten-patch-v2.diffDownload

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..e0cbe5c923 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,20 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		if (list_length(aggref->args) != 1)
 			return false;		/* it couldn't be MIN/MAX */
 
+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			return false;		/* not a MIN/MAX aggregate */
+
 		/*
 		 * ORDER BY is usually irrelevant for MIN/MAX, but it can change the
 		 * outcome if the aggsortop's operator class recognizes non-identical
@@ -267,22 +281,35 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		 * quickly.
 		 */
 		if (aggref->aggorder != NIL)
-			return false;
-		/* note: we do not care if DISTINCT is mentioned ... */
+		{
+			SortGroupClause *orderClause;
 
-		/*
-		 * We might implement the optimization when a FILTER clause is present
-		 * by adding the filter to the quals of the generated subquery.  For
-		 * now, just punt.
-		 */
-		if (aggref->aggfilter != NULL)
-			return false;
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
 
-		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
-		if (!OidIsValid(aggsortop))
-			return false;		/* not a MIN/MAX aggregate */
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			Assert(orderClause->tleSortGroupRef == curTarget->ressortgroupref);
+
+			if (orderClause->sortop != aggsortop &&
+				orderClause->sortop != get_commutator(aggsortop))
+				return false;
+		}
+
+		/* note: we do not care if DISTINCT is mentioned ... */
 
-		curTarget = (TargetEntry *) linitial(aggref->args);
 
 		if (contain_mutable_functions((Node *) curTarget->expr))
 			return false;		/* not potentially indexable */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index a5596ab210..5d37510d64 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,71 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..5cdf336fc0 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,24 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test

#12

Matthias van de Meent

boekewurm+postgres@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#9)

Re: Expand applicability of aggregate's sortop optimization

On Wed, 17 Jul 2024 at 16:09, Andrei Lepikhov <lepihov@gmail.com> wrote:

On 17/7/2024 16:33, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov <lepihov@gmail.com> wrote:

Thanks for the job! I guess you could be more brave and push down also
FILTER statement.

While probably not impossible, I wasn't planning on changing this code
with new optimizations; just expanding the applicability of the
current optimizations.

Got it>> As I see, the code:

aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
if (!OidIsValid(aggsortop))
return false; /* not a MIN/MAX aggregate */

used twice and can be evaluated earlier to avoid duplicated code.

The code is structured like this to make sure we only start accessing
catalogs once we know that all other reasons to bail out from this
optimization indicate we can apply the opimization. You'll notice that
I've tried to put the cheapest checks that only use caller-supplied
information first, and catalog accesses only after that.

If the fetch_agg_sort_op clause would be deduplicated, it would either
increase code complexity to handle both aggref->aggorder paths, or it
would increase the cost of planning MAX(a ORDER BY b) because of the
newly added catalog access.

IMO it looks like a micro optimisation. But I agree, it is more about
code style - let the committer decide what is better.>> Also, I'm unsure
about the necessity of looking through the btree

classes. Maybe just to check the commutator to the sortop, like in the
diff attached? Or could you provide an example to support your approach?

I think it could work, but I'd be hesitant to rely on that, as
commutator registration is optional (useful, but never required for
btree operator classes' operators). Looking at the btree operator
class, which is the definition of sortability in PostgreSQL, seems
more suitable and correct.

Hm, I dubious about that. Can you provide an example which my variant
will not pass but your does that correctly?

Here is one:

"""
CREATE OPERATOR @@> (
function=int4gt, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@>= (
function=int4ge, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@= (
function=int4eq, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@<= (
function=int4le, leftarg=int4, rightarg=int4
); CREATE OPERATOR @@< (
function=int4lt, leftarg=int4, rightarg=int4
);

CREATE OPERATOR CLASS my_int_ops
FOR TYPE int
USING btree AS
OPERATOR 1 @<@,
OPERATOR 2 @<=@,
OPERATOR 3 @=@,
OPERATOR 4 @>=@,
OPERATOR 5 @>@,
FUNCTION 1 btint4cmp;

CREATE AGGREGATE my_int_max (
BASETYPE = int4,
SFUNC = int4larger,
STYPE = int4,
SORTOP = @>@
);

CREATE TABLE my_table AS
SELECT id::int4 FROM generate_series(1, 10000) id;

CREATE INDEX ON my_table (id my_int_ops);

SELECT my_int_max(id ORDER BY id USING @<@ ) from my_table;
"""

Because the @<@ and @>@ operators are not registered as commutative,
it couldn't apply the optimization in your patch, while the btree
operator check does allow it to apply the optimization.

Aside: Arguably, checking for commutator operators would not be
incorrect when looking at it from "applied operators" point of view,
but if that commutative operator isn't registered as opposite ordering
of the same btree opclass, then we'd probably break some assumptions
of some aggregate's sortop - it could be registered with another
opclass, and that could cause us to select a different btree opclass
(thus: ordering) than is indicated to be required by the aggregate;
the thing we're trying to protect against here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#13

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Matthias van de Meent (#12)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 18/7/2024 19:49, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 16:09, Andrei Lepikhov <lepihov@gmail.com> wrote:

On 17/7/2024 16:33, Matthias van de Meent wrote:

On Wed, 17 Jul 2024 at 05:29, Andrei Lepikhov <lepihov@gmail.com> wrote:

Because the @<@ and @>@ operators are not registered as commutative,
it couldn't apply the optimization in your patch, while the btree
operator check does allow it to apply the optimization.

Ok, I got it.
And next issue: I think it would be better to save cycles than to free
some piece of memory, so why not to break the foreach cycle if you
already matched the opfamily?
Also, in the patch attached I added your smoothed test to the aggregates.sql

Aside: Arguably, checking for commutator operators would not be
incorrect when looking at it from "applied operators" point of view,
but if that commutative operator isn't registered as opposite ordering
of the same btree opclass, then we'd probably break some assumptions
of some aggregate's sortop - it could be registered with another
opclass, and that could cause us to select a different btree opclass
(thus: ordering) than is indicated to be required by the aggregate;
the thing we're trying to protect against hereYes, I also think if someone doesn't register < as a commutator to >, it

may mean they do it intentionally: may be it is a bit different
sortings? - this subject is too far from my experience and I can agree
with your approach.

--
regards, Andrei Lepikhov

Attachments:

rewritten-patch-v4.difftext/plain; charset=UTF-8; name=rewritten-patch-v4.diffDownload

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..9b1d6f4e43 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -253,6 +253,20 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		if (list_length(aggref->args) != 1)
 			return false;		/* it couldn't be MIN/MAX */
 
+		/*
+		 * We might implement the optimization when a FILTER clause is present
+		 * by adding the filter to the quals of the generated subquery.  For
+		 * now, just punt.
+		 */
+		if (aggref->aggfilter != NULL)
+			return false;
+
+		curTarget = (TargetEntry *) linitial(aggref->args);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			return false;		/* not a MIN/MAX aggregate */
+
 		/*
 		 * ORDER BY is usually irrelevant for MIN/MAX, but it can change the
 		 * outcome if the aggsortop's operator class recognizes non-identical
@@ -267,22 +281,55 @@ can_minmax_aggs(PlannerInfo *root, List **context)
 		 * quickly.
 		 */
 		if (aggref->aggorder != NIL)
-			return false;
-		/* note: we do not care if DISTINCT is mentioned ... */
-
-		/*
-		 * We might implement the optimization when a FILTER clause is present
-		 * by adding the filter to the quals of the generated subquery.  For
-		 * now, just punt.
-		 */
-		if (aggref->aggfilter != NULL)
-			return false;
+		{
+			SortGroupClause *orderClause;
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			Assert(orderClause->tleSortGroupRef == curTarget->ressortgroupref);
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List	   *btclasses;
+				ListCell   *cell;
+				bool		match = false;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(cell, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+					interpretation = (OpBtreeInterpretation *) lfirst(cell);
+
+					if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+					{
+						match = true;
+						break;
+					}
+				}
+
+				if (!match)
+					return false;
+			}
+		}
 
-		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
-		if (!OidIsValid(aggsortop))
-			return false;		/* not a MIN/MAX aggregate */
+		/* note: we do not care if DISTINCT is mentioned ... */
 
-		curTarget = (TargetEntry *) linitial(aggref->args);
 
 		if (contain_mutable_functions((Node *) curTarget->expr))
 			return false;		/* not potentially indexable */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index a5596ab210..94c8cf77f8 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,124 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+NOTICE:  drop cascades to index idx_int4
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..19729ae5f2 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,57 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test

#14

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Matthias van de Meent (#12)

Re: Expand applicability of aggregate's sortop optimization

On 18/7/2024 14:49, Matthias van de Meent wrote:

Aside: Arguably, checking for commutator operators would not be
incorrect when looking at it from "applied operators" point of view,
but if that commutative operator isn't registered as opposite ordering
of the same btree opclass, then we'd probably break some assumptions
of some aggregate's sortop - it could be registered with another
opclass, and that could cause us to select a different btree opclass
(thus: ordering) than is indicated to be required by the aggregate;
the thing we're trying to protect against here.

Hi,
This thread stands idle. At the same time, the general idea of this
patch and the idea of utilising prosupport functions look promising. Are
you going to develop this feature further?

--
regards, Andrei Lepikhov

#15

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: David Rowley (#10)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 18/7/2024 00:03, David Rowley wrote:

On Wed, 17 Jul 2024 at 17:12, Andrei Lepikhov <lepihov@gmail.com> wrote:

Also, I don't clearly understand the case you mentioned here - does it
mean that you want to nullify orders for other aggregate types if they
are the same as the incoming order?

No, I mean unconditionally nullify Aggref->aggorder and
Aggref->aggdistinct for aggregate functions where ORDER BY / DISTINCT
in the Aggref makes no difference to the result. I think that's ok for
max() and min() for everything besides NUMERIC. For aggorder, we'd
have to *not* optimise sum() and avg() for floating point types as
that could change the result. sum() and avg() for INT2, INT4 and INT8
seems fine. I'd need to check, but I think sum(numeric) is ok too as
the dscale should end up the same regardless of the order. Obviously,
aggdistinct can't be changed for sum() and avg() on any type.

I have written a sketch patch to implement the idea with aggregate
prosupport in code - see the attachment.
My intention was to provide an example, not a ready-to-commit patch.
As I see, the only problem there lies in the tests - CREATE AGGREGATE
doesn't allow us to set a prosupport routine, and we need to update the
pg_proc table manually.

--
regards, Andrei Lepikhov

Attachments:

0001-Introduce-prosupport-helpers-for-aggregates.patchtext/plain; charset=UTF-8; name=0001-Introduce-prosupport-helpers-for-aggregates.patchDownload

From b272413e64dd45d0b8cff11a18a16bdcc2146cab Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 1 Oct 2024 16:08:11 +0700
Subject: [PATCH] Introduce prosupport helpers for aggregates.

For example, add minmax_support routine to simplify min/max aggregates.
---
 src/backend/optimizer/plan/planagg.c     |  73 +++++++++++++
 src/backend/optimizer/prep/prepagg.c     |  26 +++++
 src/include/catalog/pg_proc.dat          |  93 ++++++++--------
 src/include/nodes/supportnodes.h         |   7 ++
 src/test/regress/expected/aggregates.out | 128 ++++++++++++++++++++++-
 src/test/regress/sql/aggregates.sql      |  55 ++++++++++
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 336 insertions(+), 47 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..93131fa022 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -510,3 +512,74 @@ fetch_agg_sort_op(Oid aggfnoid)
 
 	return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+	Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Oid		aggsortop;
+
+	if (IsA(rawreq, SupportRequestMinMax))
+	{
+		SupportRequestMinMax   *req = (SupportRequestMinMax *) rawreq;
+		Aggref				   *aggref = req->aggref;
+
+		if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+			PG_RETURN_POINTER(NULL);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			PG_RETURN_POINTER(NULL);		/* not a MIN/MAX aggregate */
+
+		if (aggref->aggorder != NIL)
+		{
+			SortGroupClause	   *orderClause;
+			TargetEntry		   *curTarget;
+
+			curTarget = (TargetEntry *) linitial(aggref->args);
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			Assert(orderClause->tleSortGroupRef == curTarget->ressortgroupref);
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List	   *btclasses;
+				ListCell   *lc;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(lc, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+
+					interpretation = (OpBtreeInterpretation *) lfirst(lc);
+					if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+					{
+						aggref->aggorder = NIL;
+						break;
+					}
+				}
+			}
+			else
+				aggref->aggorder = NIL;
+		}
+	}
+
+	PG_RETURN_POINTER(NULL);
+}
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index 4606df379a..925d621e9d 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/pathnodes.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
@@ -64,6 +65,25 @@ static int	find_compatible_trans(PlannerInfo *root, Aggref *newagg,
 								  List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 
+static void
+optimize_aggregates(Aggref *aggref)
+{
+	SupportRequestMinMax	req;
+	Oid						prosupport;
+
+	prosupport = get_func_support(aggref->aggfnoid);
+
+	/* Check if there's a support function for the aggregate */
+	if (!OidIsValid(prosupport))
+		return;
+
+
+	req.type = T_SupportRequestMinMax;
+	req.aggref = aggref;
+	/* call the support function */
+	(void) OidFunctionCall1(prosupport, PointerGetDatum(&req));
+}
+
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
  * can share aggregate or transition state.
@@ -215,6 +235,12 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 
 	ReleaseSysCache(aggTuple);
 
+	/*
+	 * See if any modifications can be made to each aggregate to allow
+	 * planner to process it in more effective way.
+	 */
+	optimize_aggregates(aggref);
+
 	/*
 	 * 1. See if this is identical to another aggregate function call that
 	 * we've seen already.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 05fcbf7515..9cda3f9b38 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6802,150 +6802,155 @@
   proname => 'sum', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 
+{ oid => '2775', descr => 'planner support for min and max aggregates',
+  proname => 'minmax_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'minmax_support' },
 { oid => '2115', descr => 'maximum value of all bigint input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
-  proargtypes => 'int8', prosrc => 'aggregate_dummy' },
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'int8', proargtypes => 'int8',
+  prosrc => 'aggregate_dummy' },
 { oid => '2116', descr => 'maximum value of all integer input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2117', descr => 'maximum value of all smallint input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2118', descr => 'maximum value of all oid input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
 { oid => '2119', descr => 'maximum value of all float4 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
 { oid => '2120', descr => 'maximum value of all float8 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
 { oid => '2122', descr => 'maximum value of all date input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'date',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
 { oid => '2123', descr => 'maximum value of all time input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'time',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
 { oid => '2124',
   descr => 'maximum value of all time with time zone input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
 { oid => '2125', descr => 'maximum value of all money input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'money',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
 { oid => '2126', descr => 'maximum value of all timestamp input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
 { oid => '2127',
   descr => 'maximum value of all timestamp with time zone input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
 { oid => '2128', descr => 'maximum value of all interval input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
 { oid => '2129', descr => 'maximum value of all text input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'text',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2130', descr => 'maximum value of all numeric input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 { oid => '2050', descr => 'maximum value of all anyarray input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
-  prorettype => 'anyarray', proargtypes => 'anyarray',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
 { oid => '8595', descr => 'maximum value of all record input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'record',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
 { oid => '2244', descr => 'maximum value of all bpchar input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
 { oid => '2797', descr => 'maximum value of all tid input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
 { oid => '3564', descr => 'maximum value of all inet input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
 { oid => '4189', descr => 'maximum value of all pg_lsn input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 { oid => '5099', descr => 'maximum value of all xid8 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 { oid => '2131', descr => 'minimum value of all bigint input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2132', descr => 'minimum value of all integer input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2133', descr => 'minimum value of all smallint input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2134', descr => 'minimum value of all oid input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
 { oid => '2135', descr => 'minimum value of all float4 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
 { oid => '2136', descr => 'minimum value of all float8 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
 { oid => '2138', descr => 'minimum value of all date input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'date',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
 { oid => '2139', descr => 'minimum value of all time input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'time',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
 { oid => '2140',
   descr => 'minimum value of all time with time zone input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
 { oid => '2141', descr => 'minimum value of all money input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'money',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
 { oid => '2142', descr => 'minimum value of all timestamp input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
 { oid => '2143',
   descr => 'minimum value of all timestamp with time zone input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
 { oid => '2144', descr => 'minimum value of all interval input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
 { oid => '2145', descr => 'minimum value of all text values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'text',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2146', descr => 'minimum value of all numeric input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 { oid => '2051', descr => 'minimum value of all anyarray input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
 { oid => '8596', descr => 'minimum value of all record input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'record',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
 { oid => '2245', descr => 'minimum value of all bpchar input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
 { oid => '2798', descr => 'minimum value of all tid input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
 { oid => '3565', descr => 'minimum value of all inet input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
 { oid => '4190', descr => 'minimum value of all pg_lsn input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 { oid => '5100', descr => 'minimum value of all xid8 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 # count has two forms: count(any) and count(*)
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index 5f7bcde891..57c1ed116a 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -343,4 +343,11 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+typedef struct SupportRequestMinMax
+{
+	NodeTag		type;
+
+	Aggref	   *aggref;
+} SupportRequestMinMax;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8ac13b562c..1053e03f25 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,128 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+NOTICE:  drop cascades to index idx_int4
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
@@ -1506,7 +1628,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1535,7 +1657,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1551,7 +1673,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: ten
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..b6507170eb 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,61 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fabb127d7..1bd08115e3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2781,6 +2781,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestMinMax
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.46.2

#16

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Andrei Lepikhov (#15)

Re: Expand applicability of aggregate's sortop optimization

On Tue, Oct 01, 2024 at 04:25:59PM +0700, Andrei Lepikhov wrote:

I have written a sketch patch to implement the idea with aggregate
prosupport in code - see the attachment.
My intention was to provide an example, not a ready-to-commit patch.
As I see, the only problem there lies in the tests - CREATE AGGREGATE
doesn't allow us to set a prosupport routine, and we need to update the
pg_proc table manually.

Please note that the CF bot is red.
--
Michael

#17

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: Michael Paquier (#16)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 10/8/24 08:19, Michael Paquier wrote:

On Tue, Oct 01, 2024 at 04:25:59PM +0700, Andrei Lepikhov wrote:

I have written a sketch patch to implement the idea with aggregate
prosupport in code - see the attachment.
My intention was to provide an example, not a ready-to-commit patch.
As I see, the only problem there lies in the tests - CREATE AGGREGATE
doesn't allow us to set a prosupport routine, and we need to update the
pg_proc table manually.

Please note that the CF bot is red.

Thanks, I suppose CATALOG_VERSION_NO was the only reason for this fail.

Also, I see that the union test fails:

  left join lateral
    (select t1.tenthous from tenk2 t2 union all (values(1)))
  on true limit 1;
-                            QUERY PLAN
--------------------------------------------------------------------
- Limit
-   ->  Nested Loop Left Join
-         ->  Seq Scan on tenk1 t1
-         ->  Append
-               ->  Index Only Scan using tenk2_hundred on tenk2 t2
-               ->  Result
-(6 rows)
-
+ERROR:  deadlock detected
+DETAIL:  Process 414506 waits for AccessShareLock on relation 24696 of 
database 16387; blocked by process 414511.
+Process 414511 waits for AccessExclusiveLock on relation 16421 of 
database 16387; blocked by process 414506.
+HINT:  See server log for query details.

But I wonder if new prosupport routine caused that. Please, let me know
if it is not a well-known issue.

--
regards, Andrei Lepikhov

Attachments:

v1-0001-Introduce-prosupport-helpers-for-aggregates.patchtext/x-patch; charset=UTF-8; name=v1-0001-Introduce-prosupport-helpers-for-aggregates.patchDownload

From c3678a52d8cac9293a50ff5a4bab951155d45c5c Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 1 Oct 2024 16:08:11 +0700
Subject: [PATCH v1] Introduce prosupport helpers for aggregates.

For example, add minmax_support routine to simplify min/max aggregates.
---
 src/backend/optimizer/plan/planagg.c     |  76 ++++++++++++++
 src/backend/optimizer/prep/prepagg.c     |  26 +++++
 src/include/catalog/catversion.h         |   2 +-
 src/include/catalog/pg_proc.dat          |  93 ++++++++--------
 src/include/nodes/supportnodes.h         |   7 ++
 src/test/regress/expected/aggregates.out | 128 ++++++++++++++++++++++-
 src/test/regress/sql/aggregates.sql      |  55 ++++++++++
 src/tools/pgindent/typedefs.list         |   1 +
 8 files changed, 340 insertions(+), 48 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..598da61af8 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -510,3 +512,77 @@ fetch_agg_sort_op(Oid aggfnoid)
 
 	return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+	Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Oid		aggsortop;
+
+	if (IsA(rawreq, SupportRequestMinMax))
+	{
+		SupportRequestMinMax   *req = (SupportRequestMinMax *) rawreq;
+		Aggref				   *aggref = req->aggref;
+
+		if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+			PG_RETURN_POINTER(NULL);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			PG_RETURN_POINTER(NULL);		/* not a MIN/MAX aggregate */
+
+		if (aggref->aggorder != NIL)
+		{
+			SortGroupClause	   *orderClause;
+			TargetEntry		   *curTarget;
+
+			curTarget = (TargetEntry *) linitial(aggref->args);
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+				elog(ERROR, "Aggregate order clause isn't found in target list");
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List	   *btclasses;
+				ListCell   *lc;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(lc, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+
+					interpretation = (OpBtreeInterpretation *) lfirst(lc);
+					if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+					{
+						aggref->aggorder = NIL;
+						break;
+					}
+				}
+
+				list_free_deep(btclasses);
+			}
+			else
+				aggref->aggorder = NIL;
+		}
+	}
+
+	PG_RETURN_POINTER(NULL);
+}
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index 4606df379a..925d621e9d 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/pathnodes.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
@@ -64,6 +65,25 @@ static int	find_compatible_trans(PlannerInfo *root, Aggref *newagg,
 								  List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 
+static void
+optimize_aggregates(Aggref *aggref)
+{
+	SupportRequestMinMax	req;
+	Oid						prosupport;
+
+	prosupport = get_func_support(aggref->aggfnoid);
+
+	/* Check if there's a support function for the aggregate */
+	if (!OidIsValid(prosupport))
+		return;
+
+
+	req.type = T_SupportRequestMinMax;
+	req.aggref = aggref;
+	/* call the support function */
+	(void) OidFunctionCall1(prosupport, PointerGetDatum(&req));
+}
+
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
  * can share aggregate or transition state.
@@ -215,6 +235,12 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 
 	ReleaseSysCache(aggTuple);
 
+	/*
+	 * See if any modifications can be made to each aggregate to allow
+	 * planner to process it in more effective way.
+	 */
+	optimize_aggregates(aggref);
+
 	/*
 	 * 1. See if this is identical to another aggregate function call that
 	 * we've seen already.
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 504bbe5327..4d4a2dd7af 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202410021
+#define CATALOG_VERSION_NO	202410081
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77f54a79e6..df0387d4e2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6807,150 +6807,155 @@
   proname => 'sum', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 
+{ oid => '2775', descr => 'planner support for min and max aggregates',
+  proname => 'minmax_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'minmax_support' },
 { oid => '2115', descr => 'maximum value of all bigint input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
-  proargtypes => 'int8', prosrc => 'aggregate_dummy' },
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'int8', proargtypes => 'int8',
+  prosrc => 'aggregate_dummy' },
 { oid => '2116', descr => 'maximum value of all integer input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2117', descr => 'maximum value of all smallint input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2118', descr => 'maximum value of all oid input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
 { oid => '2119', descr => 'maximum value of all float4 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
 { oid => '2120', descr => 'maximum value of all float8 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
 { oid => '2122', descr => 'maximum value of all date input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'date',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
 { oid => '2123', descr => 'maximum value of all time input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'time',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
 { oid => '2124',
   descr => 'maximum value of all time with time zone input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
 { oid => '2125', descr => 'maximum value of all money input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'money',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
 { oid => '2126', descr => 'maximum value of all timestamp input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
 { oid => '2127',
   descr => 'maximum value of all timestamp with time zone input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
 { oid => '2128', descr => 'maximum value of all interval input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
 { oid => '2129', descr => 'maximum value of all text input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'text',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2130', descr => 'maximum value of all numeric input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 { oid => '2050', descr => 'maximum value of all anyarray input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
-  prorettype => 'anyarray', proargtypes => 'anyarray',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
 { oid => '8595', descr => 'maximum value of all record input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'record',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
 { oid => '2244', descr => 'maximum value of all bpchar input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
 { oid => '2797', descr => 'maximum value of all tid input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
 { oid => '3564', descr => 'maximum value of all inet input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
 { oid => '4189', descr => 'maximum value of all pg_lsn input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 { oid => '5099', descr => 'maximum value of all xid8 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 { oid => '2131', descr => 'minimum value of all bigint input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2132', descr => 'minimum value of all integer input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2133', descr => 'minimum value of all smallint input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2134', descr => 'minimum value of all oid input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
 { oid => '2135', descr => 'minimum value of all float4 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
 { oid => '2136', descr => 'minimum value of all float8 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
 { oid => '2138', descr => 'minimum value of all date input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'date',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
 { oid => '2139', descr => 'minimum value of all time input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'time',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
 { oid => '2140',
   descr => 'minimum value of all time with time zone input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
 { oid => '2141', descr => 'minimum value of all money input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'money',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
 { oid => '2142', descr => 'minimum value of all timestamp input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
 { oid => '2143',
   descr => 'minimum value of all timestamp with time zone input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
 { oid => '2144', descr => 'minimum value of all interval input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
 { oid => '2145', descr => 'minimum value of all text values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'text',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2146', descr => 'minimum value of all numeric input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 { oid => '2051', descr => 'minimum value of all anyarray input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
 { oid => '8596', descr => 'minimum value of all record input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'record',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
 { oid => '2245', descr => 'minimum value of all bpchar input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
 { oid => '2798', descr => 'minimum value of all tid input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
 { oid => '3565', descr => 'minimum value of all inet input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
 { oid => '4190', descr => 'minimum value of all pg_lsn input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 { oid => '5100', descr => 'minimum value of all xid8 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 # count has two forms: count(any) and count(*)
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index 5f7bcde891..57c1ed116a 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -343,4 +343,11 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+typedef struct SupportRequestMinMax
+{
+	NodeTag		type;
+
+	Aggref	   *aggref;
+} SupportRequestMinMax;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8ac13b562c..1053e03f25 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,128 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+NOTICE:  drop cascades to index idx_int4
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
@@ -1506,7 +1628,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1535,7 +1657,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1551,7 +1673,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: ten
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..b6507170eb 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,61 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a65e1c07c5..823702e819 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2781,6 +2781,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestMinMax
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.39.5

#18

David Rowley

dgrowleyml@gmail.com

over 1 year ago

In reply to: Andrei Lepikhov (#17)

Re: Expand applicability of aggregate's sortop optimization

On Tue, 8 Oct 2024 at 18:47, Andrei Lepikhov <lepihov@gmail.com> wrote:

Thanks, I suppose CATALOG_VERSION_NO was the only reason for this fail.

Please leave the cat version bump out of your patch. It's a waste of
time and resource if you plan to post another patch every time a
committer bumps the cat version.

David

#19

Andrei Lepikhov

lepihov@gmail.com

over 1 year ago

In reply to: David Rowley (#18)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 10/8/24 13:06, David Rowley wrote:

On Tue, 8 Oct 2024 at 18:47, Andrei Lepikhov <lepihov@gmail.com> wrote:

Thanks, I suppose CATALOG_VERSION_NO was the only reason for this fail.

Please leave the cat version bump out of your patch. It's a waste of
time and resource if you plan to post another patch every time a
committer bumps the cat version.

Thanks, see attachment.

--
regards, Andrei Lepikhov

Attachments:

v2-0001-Introduce-prosupport-helpers-for-aggregates.patchtext/x-patch; charset=UTF-8; name=v2-0001-Introduce-prosupport-helpers-for-aggregates.patchDownload

From b6388e2d4f9f2c2d3118fe5690cf76d396e28af5 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 1 Oct 2024 16:08:11 +0700
Subject: [PATCH v2] Introduce prosupport helpers for aggregates.

For example, add minmax_support routine to simplify min/max aggregates.
---
 src/backend/optimizer/plan/planagg.c     |  76 ++++++++++++++
 src/backend/optimizer/prep/prepagg.c     |  26 +++++
 src/include/catalog/pg_proc.dat          |  93 ++++++++--------
 src/include/nodes/supportnodes.h         |   7 ++
 src/test/regress/expected/aggregates.out | 128 ++++++++++++++++++++++-
 src/test/regress/sql/aggregates.sql      |  55 ++++++++++
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 339 insertions(+), 47 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index afb5445b77..598da61af8 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -510,3 +512,77 @@ fetch_agg_sort_op(Oid aggfnoid)
 
 	return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+	Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Oid		aggsortop;
+
+	if (IsA(rawreq, SupportRequestMinMax))
+	{
+		SupportRequestMinMax   *req = (SupportRequestMinMax *) rawreq;
+		Aggref				   *aggref = req->aggref;
+
+		if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+			PG_RETURN_POINTER(NULL);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			PG_RETURN_POINTER(NULL);		/* not a MIN/MAX aggregate */
+
+		if (aggref->aggorder != NIL)
+		{
+			SortGroupClause	   *orderClause;
+			TargetEntry		   *curTarget;
+
+			curTarget = (TargetEntry *) linitial(aggref->args);
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+				elog(ERROR, "Aggregate order clause isn't found in target list");
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List	   *btclasses;
+				ListCell   *lc;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(lc, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+
+					interpretation = (OpBtreeInterpretation *) lfirst(lc);
+					if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+					{
+						aggref->aggorder = NIL;
+						break;
+					}
+				}
+
+				list_free_deep(btclasses);
+			}
+			else
+				aggref->aggorder = NIL;
+		}
+	}
+
+	PG_RETURN_POINTER(NULL);
+}
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index 4606df379a..925d621e9d 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/pathnodes.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
@@ -64,6 +65,25 @@ static int	find_compatible_trans(PlannerInfo *root, Aggref *newagg,
 								  List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 
+static void
+optimize_aggregates(Aggref *aggref)
+{
+	SupportRequestMinMax	req;
+	Oid						prosupport;
+
+	prosupport = get_func_support(aggref->aggfnoid);
+
+	/* Check if there's a support function for the aggregate */
+	if (!OidIsValid(prosupport))
+		return;
+
+
+	req.type = T_SupportRequestMinMax;
+	req.aggref = aggref;
+	/* call the support function */
+	(void) OidFunctionCall1(prosupport, PointerGetDatum(&req));
+}
+
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
  * can share aggregate or transition state.
@@ -215,6 +235,12 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 
 	ReleaseSysCache(aggTuple);
 
+	/*
+	 * See if any modifications can be made to each aggregate to allow
+	 * planner to process it in more effective way.
+	 */
+	optimize_aggregates(aggref);
+
 	/*
 	 * 1. See if this is identical to another aggregate function call that
 	 * we've seen already.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77f54a79e6..df0387d4e2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6807,150 +6807,155 @@
   proname => 'sum', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 
+{ oid => '2775', descr => 'planner support for min and max aggregates',
+  proname => 'minmax_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'minmax_support' },
 { oid => '2115', descr => 'maximum value of all bigint input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
-  proargtypes => 'int8', prosrc => 'aggregate_dummy' },
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'int8', proargtypes => 'int8',
+  prosrc => 'aggregate_dummy' },
 { oid => '2116', descr => 'maximum value of all integer input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2117', descr => 'maximum value of all smallint input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2118', descr => 'maximum value of all oid input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
 { oid => '2119', descr => 'maximum value of all float4 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
 { oid => '2120', descr => 'maximum value of all float8 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
 { oid => '2122', descr => 'maximum value of all date input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'date',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
 { oid => '2123', descr => 'maximum value of all time input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'time',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
 { oid => '2124',
   descr => 'maximum value of all time with time zone input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
 { oid => '2125', descr => 'maximum value of all money input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'money',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
 { oid => '2126', descr => 'maximum value of all timestamp input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
 { oid => '2127',
   descr => 'maximum value of all timestamp with time zone input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
 { oid => '2128', descr => 'maximum value of all interval input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
 { oid => '2129', descr => 'maximum value of all text input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'text',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2130', descr => 'maximum value of all numeric input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 { oid => '2050', descr => 'maximum value of all anyarray input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f',
-  prorettype => 'anyarray', proargtypes => 'anyarray',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a',
+  proisstrict => 'f', prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
 { oid => '8595', descr => 'maximum value of all record input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'record',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
 { oid => '2244', descr => 'maximum value of all bpchar input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
 { oid => '2797', descr => 'maximum value of all tid input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
 { oid => '3564', descr => 'maximum value of all inet input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
 { oid => '4189', descr => 'maximum value of all pg_lsn input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 { oid => '5099', descr => 'maximum value of all xid8 input values',
-  proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proname => 'max', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 { oid => '2131', descr => 'minimum value of all bigint input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
 { oid => '2132', descr => 'minimum value of all integer input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
 { oid => '2133', descr => 'minimum value of all smallint input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
 { oid => '2134', descr => 'minimum value of all oid input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
 { oid => '2135', descr => 'minimum value of all float4 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
 { oid => '2136', descr => 'minimum value of all float8 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
 { oid => '2138', descr => 'minimum value of all date input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'date',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
 { oid => '2139', descr => 'minimum value of all time input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'time',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
 { oid => '2140',
   descr => 'minimum value of all time with time zone input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
 { oid => '2141', descr => 'minimum value of all money input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'money',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
 { oid => '2142', descr => 'minimum value of all timestamp input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
 { oid => '2143',
   descr => 'minimum value of all timestamp with time zone input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
 { oid => '2144', descr => 'minimum value of all interval input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
 { oid => '2145', descr => 'minimum value of all text values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'text',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2146', descr => 'minimum value of all numeric input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 { oid => '2051', descr => 'minimum value of all anyarray input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
 { oid => '8596', descr => 'minimum value of all record input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'record',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
 { oid => '2245', descr => 'minimum value of all bpchar input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
 { oid => '2798', descr => 'minimum value of all tid input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
 { oid => '3565', descr => 'minimum value of all inet input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
 { oid => '4190', descr => 'minimum value of all pg_lsn input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
 { oid => '5100', descr => 'minimum value of all xid8 input values',
-  proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
+  proname => 'min', prosupport => 'minmax_support', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
 
 # count has two forms: count(any) and count(*)
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index 5f7bcde891..57c1ed116a 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -343,4 +343,11 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+typedef struct SupportRequestMinMax
+{
+	NodeTag		type;
+
+	Aggref	   *aggref;
+} SupportRequestMinMax;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8ac13b562c..1053e03f25 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,128 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+NOTICE:  drop cascades to index idx_int4
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
@@ -1506,7 +1628,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1535,7 +1657,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1551,7 +1673,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: ten
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index ca6d1bcfb7..b6507170eb 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,61 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a65e1c07c5..823702e819 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2781,6 +2781,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestMinMax
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.39.5

#20

Andrei Lepikhov

lepihov@gmail.com

about 1 year ago

In reply to: Andrei Lepikhov (#19)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 10/8/24 16:22, Andrei Lepikhov wrote:

On 10/8/24 13:06, David Rowley wrote:

On Tue, 8 Oct 2024 at 18:47, Andrei Lepikhov <lepihov@gmail.com> wrote:

Thanks, I suppose CATALOG_VERSION_NO was the only reason for this fail.

Please leave the cat version bump out of your patch. It's a waste of
time and resource if you plan to post another patch every time a
committer bumps the cat version.

Thanks, see attachment.

Rebase onto current master and slight improvement.

--
regards, Andrei Lepikhov

Attachments:

v3-0001-Add-prosupport-helpers-to-aggregate-functions.patchtext/x-patch; charset=UTF-8; name=v3-0001-Add-prosupport-helpers-to-aggregate-functions.patchDownload

From c2cecc743fa699617fac25e570fa213b0cccc0df Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 8 Nov 2024 10:36:06 +0700
Subject: [PATCH v3] Add prosupport helpers to aggregate functions.

Following the spirit of ed1a88d, add a prosupprto calls for aggregates.
Here we add a new support function node type to allow support functions to
be called for aggregate functions.
With this code the only min/max trivial optimisation is introduced in the core.
But a user can design alternative support functions on their own.
---
 src/backend/optimizer/plan/planagg.c     |  76 ++++++++++++++
 src/backend/optimizer/prep/prepagg.c     |  26 +++++
 src/include/catalog/pg_proc.dat          |  92 ++++++++--------
 src/include/nodes/supportnodes.h         |   7 ++
 src/test/regress/expected/aggregates.out | 128 ++++++++++++++++++++++-
 src/test/regress/sql/aggregates.sql      |  55 ++++++++++
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 338 insertions(+), 47 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index dbcd3b9a0a..845b17f01b 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -512,3 +514,77 @@ fetch_agg_sort_op(Oid aggfnoid)
 
 	return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+	Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Oid		aggsortop;
+
+	if (IsA(rawreq, SupportRequestAggregate))
+	{
+		SupportRequestAggregate   *req = (SupportRequestAggregate *) rawreq;
+		Aggref				   *aggref = req->aggref;
+
+		if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+			PG_RETURN_POINTER(NULL);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			PG_RETURN_POINTER(NULL);		/* not a MIN/MAX aggregate */
+
+		if (aggref->aggorder != NIL)
+		{
+			SortGroupClause	   *orderClause;
+			TargetEntry		   *curTarget;
+
+			curTarget = (TargetEntry *) linitial(aggref->args);
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+				elog(ERROR, "Aggregate order clause isn't found in target list");
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List	   *btclasses;
+				ListCell   *lc;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(lc, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+
+					interpretation = (OpBtreeInterpretation *) lfirst(lc);
+					if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+					{
+						aggref->aggorder = NIL;
+						break;
+					}
+				}
+
+				list_free_deep(btclasses);
+			}
+			else
+				aggref->aggorder = NIL;
+		}
+	}
+
+	PG_RETURN_POINTER(NULL);
+}
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index 4606df379a..f872599506 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/pathnodes.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
@@ -64,6 +65,25 @@ static int	find_compatible_trans(PlannerInfo *root, Aggref *newagg,
 								  List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 
+static void
+optimize_aggregates(Aggref *aggref)
+{
+	SupportRequestAggregate	req;
+	Oid						prosupport;
+
+	prosupport = get_func_support(aggref->aggfnoid);
+
+	/* Check if there's a support function for the aggregate */
+	if (!OidIsValid(prosupport))
+		return;
+
+
+	req.type = T_SupportRequestAggregate;
+	req.aggref = aggref;
+	/* call the support function */
+	(void) OidFunctionCall1(prosupport, PointerGetDatum(&req));
+}
+
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
  * can share aggregate or transition state.
@@ -215,6 +235,12 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 
 	ReleaseSysCache(aggTuple);
 
+	/*
+	 * See if any modifications can be made to each aggregate to allow
+	 * planner to process it in more effective way.
+	 */
+	optimize_aggregates(aggref);
+
 	/*
 	 * 1. See if this is identical to another aggregate function call that
 	 * we've seen already.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f23321a41f..dc4b5e5c5f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6811,155 +6811,159 @@
   proname => 'sum', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 
-{ oid => '2115', descr => 'maximum value of all bigint input values',
+{ oid => '2775', descr => 'planner support for min and max aggregates',
+  proname => 'minmax_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'minmax_support' },
+{ oid => '2115', prosupport => 'minmax_support',
+  descr => 'maximum value of all bigint input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
-{ oid => '2116', descr => 'maximum value of all integer input values',
+{ oid => '2116', prosupport => 'minmax_support', descr => 'maximum value of all integer input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
-{ oid => '2117', descr => 'maximum value of all smallint input values',
+{ oid => '2117', prosupport => 'minmax_support', descr => 'maximum value of all smallint input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
-{ oid => '2118', descr => 'maximum value of all oid input values',
+{ oid => '2118', prosupport => 'minmax_support', descr => 'maximum value of all oid input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
-{ oid => '2119', descr => 'maximum value of all float4 input values',
+{ oid => '2119', prosupport => 'minmax_support', descr => 'maximum value of all float4 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
-{ oid => '2120', descr => 'maximum value of all float8 input values',
+{ oid => '2120', prosupport => 'minmax_support', descr => 'maximum value of all float8 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
-{ oid => '2122', descr => 'maximum value of all date input values',
+{ oid => '2122', prosupport => 'minmax_support', descr => 'maximum value of all date input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
-{ oid => '2123', descr => 'maximum value of all time input values',
+{ oid => '2123', prosupport => 'minmax_support', descr => 'maximum value of all time input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
-{ oid => '2124',
+{ oid => '2124', prosupport => 'minmax_support',
   descr => 'maximum value of all time with time zone input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
-{ oid => '2125', descr => 'maximum value of all money input values',
+{ oid => '2125', prosupport => 'minmax_support', descr => 'maximum value of all money input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
-{ oid => '2126', descr => 'maximum value of all timestamp input values',
+{ oid => '2126', prosupport => 'minmax_support', descr => 'maximum value of all timestamp input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
-{ oid => '2127',
+{ oid => '2127', prosupport => 'minmax_support',
   descr => 'maximum value of all timestamp with time zone input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
-{ oid => '2128', descr => 'maximum value of all interval input values',
+{ oid => '2128', prosupport => 'minmax_support', descr => 'maximum value of all interval input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
-{ oid => '2129', descr => 'maximum value of all text input values',
+{ oid => '2129', prosupport => 'minmax_support', descr => 'maximum value of all text input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2130', descr => 'maximum value of all numeric input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
-{ oid => '2050', descr => 'maximum value of all anyarray input values',
+{ oid => '2050', prosupport => 'minmax_support', descr => 'maximum value of all anyarray input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
-{ oid => '8595', descr => 'maximum value of all record input values',
+{ oid => '8595', prosupport => 'minmax_support', descr => 'maximum value of all record input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
-{ oid => '2244', descr => 'maximum value of all bpchar input values',
+{ oid => '2244', prosupport => 'minmax_support', descr => 'maximum value of all bpchar input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
-{ oid => '2797', descr => 'maximum value of all tid input values',
+{ oid => '2797', prosupport => 'minmax_support', descr => 'maximum value of all tid input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
-{ oid => '3564', descr => 'maximum value of all inet input values',
+{ oid => '3564', prosupport => 'minmax_support', descr => 'maximum value of all inet input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
-{ oid => '4189', descr => 'maximum value of all pg_lsn input values',
+{ oid => '4189', prosupport => 'minmax_support', descr => 'maximum value of all pg_lsn input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
-{ oid => '5099', descr => 'maximum value of all xid8 input values',
+{ oid => '5099', prosupport => 'minmax_support', descr => 'maximum value of all xid8 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
-{ oid => '8922', descr => 'maximum value of all bytea input values',
+{ oid => '8922', prosupport => 'minmax_support', descr => 'maximum value of all bytea input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bytea',
   proargtypes => 'bytea', prosrc => 'aggregate_dummy' },
 
-{ oid => '2131', descr => 'minimum value of all bigint input values',
+{ oid => '2131', prosupport => 'minmax_support', descr => 'minimum value of all bigint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
-{ oid => '2132', descr => 'minimum value of all integer input values',
+{ oid => '2132', prosupport => 'minmax_support', descr => 'minimum value of all integer input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
-{ oid => '2133', descr => 'minimum value of all smallint input values',
+{ oid => '2133', prosupport => 'minmax_support', descr => 'minimum value of all smallint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
-{ oid => '2134', descr => 'minimum value of all oid input values',
+{ oid => '2134', prosupport => 'minmax_support', descr => 'minimum value of all oid input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
-{ oid => '2135', descr => 'minimum value of all float4 input values',
+{ oid => '2135', prosupport => 'minmax_support', descr => 'minimum value of all float4 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
-{ oid => '2136', descr => 'minimum value of all float8 input values',
+{ oid => '2136', prosupport => 'minmax_support', descr => 'minimum value of all float8 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
-{ oid => '2138', descr => 'minimum value of all date input values',
+{ oid => '2138', prosupport => 'minmax_support', descr => 'minimum value of all date input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
-{ oid => '2139', descr => 'minimum value of all time input values',
+{ oid => '2139', prosupport => 'minmax_support', descr => 'minimum value of all time input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
-{ oid => '2140',
+{ oid => '2140', prosupport => 'minmax_support',
   descr => 'minimum value of all time with time zone input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
-{ oid => '2141', descr => 'minimum value of all money input values',
+{ oid => '2141', prosupport => 'minmax_support', descr => 'minimum value of all money input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
-{ oid => '2142', descr => 'minimum value of all timestamp input values',
+{ oid => '2142', prosupport => 'minmax_support', descr => 'minimum value of all timestamp input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
-{ oid => '2143',
+{ oid => '2143', prosupport => 'minmax_support',
   descr => 'minimum value of all timestamp with time zone input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
-{ oid => '2144', descr => 'minimum value of all interval input values',
+{ oid => '2144', prosupport => 'minmax_support', descr => 'minimum value of all interval input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
-{ oid => '2145', descr => 'minimum value of all text values',
+{ oid => '2145', prosupport => 'minmax_support', descr => 'minimum value of all text values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2146', descr => 'minimum value of all numeric input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
-{ oid => '2051', descr => 'minimum value of all anyarray input values',
+{ oid => '2051', prosupport => 'minmax_support', descr => 'minimum value of all anyarray input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
-{ oid => '8596', descr => 'minimum value of all record input values',
+{ oid => '8596', prosupport => 'minmax_support', descr => 'minimum value of all record input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
-{ oid => '2245', descr => 'minimum value of all bpchar input values',
+{ oid => '2245', prosupport => 'minmax_support', descr => 'minimum value of all bpchar input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
-{ oid => '2798', descr => 'minimum value of all tid input values',
+{ oid => '2798', prosupport => 'minmax_support', descr => 'minimum value of all tid input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
-{ oid => '3565', descr => 'minimum value of all inet input values',
+{ oid => '3565', prosupport => 'minmax_support', descr => 'minimum value of all inet input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
-{ oid => '4190', descr => 'minimum value of all pg_lsn input values',
+{ oid => '4190', prosupport => 'minmax_support', descr => 'minimum value of all pg_lsn input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
-{ oid => '5100', descr => 'minimum value of all xid8 input values',
+{ oid => '5100', prosupport => 'minmax_support', descr => 'minimum value of all xid8 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
-{ oid => '8923', descr => 'minimum value of all bytea input values',
+{ oid => '8923', prosupport => 'minmax_support', descr => 'minimum value of all bytea input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bytea',
   proargtypes => 'bytea', prosrc => 'aggregate_dummy' },
 
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index 5f7bcde891..5b9a14c855 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -343,4 +343,11 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+typedef struct SupportRequestAggregate
+{
+	NodeTag		type;
+
+	Aggref	   *aggref;
+} SupportRequestAggregate;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1e682565d1..7d1cf38091 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,128 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+                              QUERY PLAN                               
+-----------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: tenk1.unique1
+           ->  Index Only Scan Backward using idx_int4 on public.tenk1
+                 Output: tenk1.unique1
+                 Index Cond: (tenk1.unique1 IS NOT NULL)
+(8 rows)
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+NOTICE:  drop cascades to index idx_int4
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
@@ -1506,7 +1628,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1535,7 +1657,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1551,7 +1673,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: ten
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 4885daffe6..60b8957734 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,61 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE INDEX idx_int4 ON tenk1 (unique1 my_int_ops);
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @<@ ) from tenk1;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(unique1::int4 ORDER BY unique1::int4 USING @>@ ) from tenk1;
+
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1847bbfa95..50e2d4fdb8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2783,6 +2783,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestAggregate
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.39.5

#21

Michael Paquier

michael@paquier.xyz

about 1 year ago

In reply to: Andrei Lepikhov (#20)

Re: Expand applicability of aggregate's sortop optimization

On Fri, Nov 08, 2024 at 01:05:04PM +0700, Andrei Lepikhov wrote:

Rebase onto current master and slight improvement.

Your patch is failing in the CI, please rebase. I have it to the next
CF for now, waiting on author.
--
Michael

#22

Andrei Lepikhov

lepihov@gmail.com

about 1 year ago

In reply to: Michael Paquier (#21)

Re: Expand applicability of aggregate's sortop optimization

On 11/12/2024 07:22, Michael Paquier wrote:

On Fri, Nov 08, 2024 at 01:05:04PM +0700, Andrei Lepikhov wrote:

Rebase onto current master and slight improvement.

Your patch is failing in the CI, please rebase. I have it to the next
CF for now, waiting on author.

Thanks! I have observed such issues before, but their complexity is out
of my league yet. Why can a prosupport function cause deadlock in a
simple query? That's the question to discover.

--
regards, Andrei Lepikhov

#23

Andrei Lepikhov

lepihov@gmail.com

about 1 year ago

In reply to: Michael Paquier (#21)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

On 12/11/24 07:22, Michael Paquier wrote:

On Fri, Nov 08, 2024 at 01:05:04PM +0700, Andrei Lepikhov wrote:

Rebase onto current master and slight improvement.

Your patch is failing in the CI, please rebase. I have it to the next
CF for now, waiting on author.

Thanks,
I overlooked the fact that regression tests can be launched in parallel.
Therefore, adding an index to the frequently used tenk1 table was a bad
idea.
In the attachment - the new patch rebased on the current master.

--
regards, Andrei Lepikhov

Attachments:

v4-0001-Add-prosupport-helpers-to-aggregate-functions.patchtext/x-patch; charset=UTF-8; name=v4-0001-Add-prosupport-helpers-to-aggregate-functions.patchDownload

From 880ddf7170a0e01fb852172ba99a32e2494f50c6 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Fri, 8 Nov 2024 10:36:06 +0700
Subject: [PATCH v4] Add prosupport helpers to aggregate functions.

Following the spirit of ed1a88d, add a prosupport call for aggregates.
Here, we introduce a new support function node type to allow support functions
to be called for aggregate functions.
This code introduces only min/max trivial optimisation in the core. However, a
user can design alternative support functions on their own.
---
 src/backend/optimizer/plan/planagg.c     |  76 +++++++++++++
 src/backend/optimizer/prep/prepagg.c     |  26 +++++
 src/include/catalog/pg_proc.dat          |  92 +++++++--------
 src/include/nodes/supportnodes.h         |   7 ++
 src/test/regress/expected/aggregates.out | 135 ++++++++++++++++++++++-
 src/test/regress/sql/aggregates.sql      |  62 +++++++++++
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 352 insertions(+), 47 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index dbcd3b9a0a..845b17f01b 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -512,3 +514,77 @@ fetch_agg_sort_op(Oid aggfnoid)
 
 	return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+	Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Oid		aggsortop;
+
+	if (IsA(rawreq, SupportRequestAggregate))
+	{
+		SupportRequestAggregate   *req = (SupportRequestAggregate *) rawreq;
+		Aggref				   *aggref = req->aggref;
+
+		if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+			PG_RETURN_POINTER(NULL);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			PG_RETURN_POINTER(NULL);		/* not a MIN/MAX aggregate */
+
+		if (aggref->aggorder != NIL)
+		{
+			SortGroupClause	   *orderClause;
+			TargetEntry		   *curTarget;
+
+			curTarget = (TargetEntry *) linitial(aggref->args);
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+				elog(ERROR, "Aggregate order clause isn't found in target list");
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List	   *btclasses;
+				ListCell   *lc;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(lc, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+
+					interpretation = (OpBtreeInterpretation *) lfirst(lc);
+					if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+					{
+						aggref->aggorder = NIL;
+						break;
+					}
+				}
+
+				list_free_deep(btclasses);
+			}
+			else
+				aggref->aggorder = NIL;
+		}
+	}
+
+	PG_RETURN_POINTER(NULL);
+}
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index e935c9d094..74347b3d7e 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/pathnodes.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
@@ -64,6 +65,25 @@ static int	find_compatible_trans(PlannerInfo *root, Aggref *newagg,
 								  List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 
+static void
+optimize_aggregates(Aggref *aggref)
+{
+	SupportRequestAggregate	req;
+	Oid						prosupport;
+
+	prosupport = get_func_support(aggref->aggfnoid);
+
+	/* Check if there's a support function for the aggregate */
+	if (!OidIsValid(prosupport))
+		return;
+
+
+	req.type = T_SupportRequestAggregate;
+	req.aggref = aggref;
+	/* call the support function */
+	(void) OidFunctionCall1(prosupport, PointerGetDatum(&req));
+}
+
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
  * can share aggregate or transition state.
@@ -215,6 +235,12 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 
 	ReleaseSysCache(aggTuple);
 
+	/*
+	 * See if any modifications can be made to each aggregate to allow
+	 * planner to process it in more effective way.
+	 */
+	optimize_aggregates(aggref);
+
 	/*
 	 * 1. See if this is identical to another aggregate function call that
 	 * we've seen already.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 2dcc2d42da..57a939ec60 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6834,155 +6834,159 @@
   proname => 'sum', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 
-{ oid => '2115', descr => 'maximum value of all bigint input values',
+{ oid => '2775', descr => 'planner support for min and max aggregates',
+  proname => 'minmax_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'minmax_support' },
+{ oid => '2115', prosupport => 'minmax_support',
+  descr => 'maximum value of all bigint input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
-{ oid => '2116', descr => 'maximum value of all integer input values',
+{ oid => '2116', prosupport => 'minmax_support', descr => 'maximum value of all integer input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
-{ oid => '2117', descr => 'maximum value of all smallint input values',
+{ oid => '2117', prosupport => 'minmax_support', descr => 'maximum value of all smallint input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
-{ oid => '2118', descr => 'maximum value of all oid input values',
+{ oid => '2118', prosupport => 'minmax_support', descr => 'maximum value of all oid input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
-{ oid => '2119', descr => 'maximum value of all float4 input values',
+{ oid => '2119', prosupport => 'minmax_support', descr => 'maximum value of all float4 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
-{ oid => '2120', descr => 'maximum value of all float8 input values',
+{ oid => '2120', prosupport => 'minmax_support', descr => 'maximum value of all float8 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
-{ oid => '2122', descr => 'maximum value of all date input values',
+{ oid => '2122', prosupport => 'minmax_support', descr => 'maximum value of all date input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
-{ oid => '2123', descr => 'maximum value of all time input values',
+{ oid => '2123', prosupport => 'minmax_support', descr => 'maximum value of all time input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
-{ oid => '2124',
+{ oid => '2124', prosupport => 'minmax_support',
   descr => 'maximum value of all time with time zone input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
-{ oid => '2125', descr => 'maximum value of all money input values',
+{ oid => '2125', prosupport => 'minmax_support', descr => 'maximum value of all money input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
-{ oid => '2126', descr => 'maximum value of all timestamp input values',
+{ oid => '2126', prosupport => 'minmax_support', descr => 'maximum value of all timestamp input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
-{ oid => '2127',
+{ oid => '2127', prosupport => 'minmax_support',
   descr => 'maximum value of all timestamp with time zone input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
-{ oid => '2128', descr => 'maximum value of all interval input values',
+{ oid => '2128', prosupport => 'minmax_support', descr => 'maximum value of all interval input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
-{ oid => '2129', descr => 'maximum value of all text input values',
+{ oid => '2129', prosupport => 'minmax_support', descr => 'maximum value of all text input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2130', descr => 'maximum value of all numeric input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
-{ oid => '2050', descr => 'maximum value of all anyarray input values',
+{ oid => '2050', prosupport => 'minmax_support', descr => 'maximum value of all anyarray input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
-{ oid => '8595', descr => 'maximum value of all record input values',
+{ oid => '8595', prosupport => 'minmax_support', descr => 'maximum value of all record input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
-{ oid => '2244', descr => 'maximum value of all bpchar input values',
+{ oid => '2244', prosupport => 'minmax_support', descr => 'maximum value of all bpchar input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
-{ oid => '2797', descr => 'maximum value of all tid input values',
+{ oid => '2797', prosupport => 'minmax_support', descr => 'maximum value of all tid input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
-{ oid => '3564', descr => 'maximum value of all inet input values',
+{ oid => '3564', prosupport => 'minmax_support', descr => 'maximum value of all inet input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
-{ oid => '4189', descr => 'maximum value of all pg_lsn input values',
+{ oid => '4189', prosupport => 'minmax_support', descr => 'maximum value of all pg_lsn input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
-{ oid => '5099', descr => 'maximum value of all xid8 input values',
+{ oid => '5099', prosupport => 'minmax_support', descr => 'maximum value of all xid8 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
-{ oid => '8922', descr => 'maximum value of all bytea input values',
+{ oid => '8922', prosupport => 'minmax_support', descr => 'maximum value of all bytea input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bytea',
   proargtypes => 'bytea', prosrc => 'aggregate_dummy' },
 
-{ oid => '2131', descr => 'minimum value of all bigint input values',
+{ oid => '2131', prosupport => 'minmax_support', descr => 'minimum value of all bigint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
-{ oid => '2132', descr => 'minimum value of all integer input values',
+{ oid => '2132', prosupport => 'minmax_support', descr => 'minimum value of all integer input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
-{ oid => '2133', descr => 'minimum value of all smallint input values',
+{ oid => '2133', prosupport => 'minmax_support', descr => 'minimum value of all smallint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
-{ oid => '2134', descr => 'minimum value of all oid input values',
+{ oid => '2134', prosupport => 'minmax_support', descr => 'minimum value of all oid input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
-{ oid => '2135', descr => 'minimum value of all float4 input values',
+{ oid => '2135', prosupport => 'minmax_support', descr => 'minimum value of all float4 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
-{ oid => '2136', descr => 'minimum value of all float8 input values',
+{ oid => '2136', prosupport => 'minmax_support', descr => 'minimum value of all float8 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
-{ oid => '2138', descr => 'minimum value of all date input values',
+{ oid => '2138', prosupport => 'minmax_support', descr => 'minimum value of all date input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
-{ oid => '2139', descr => 'minimum value of all time input values',
+{ oid => '2139', prosupport => 'minmax_support', descr => 'minimum value of all time input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
-{ oid => '2140',
+{ oid => '2140', prosupport => 'minmax_support',
   descr => 'minimum value of all time with time zone input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
-{ oid => '2141', descr => 'minimum value of all money input values',
+{ oid => '2141', prosupport => 'minmax_support', descr => 'minimum value of all money input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
-{ oid => '2142', descr => 'minimum value of all timestamp input values',
+{ oid => '2142', prosupport => 'minmax_support', descr => 'minimum value of all timestamp input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
-{ oid => '2143',
+{ oid => '2143', prosupport => 'minmax_support',
   descr => 'minimum value of all timestamp with time zone input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
-{ oid => '2144', descr => 'minimum value of all interval input values',
+{ oid => '2144', prosupport => 'minmax_support', descr => 'minimum value of all interval input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
-{ oid => '2145', descr => 'minimum value of all text values',
+{ oid => '2145', prosupport => 'minmax_support', descr => 'minimum value of all text values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2146', descr => 'minimum value of all numeric input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
-{ oid => '2051', descr => 'minimum value of all anyarray input values',
+{ oid => '2051', prosupport => 'minmax_support', descr => 'minimum value of all anyarray input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
-{ oid => '8596', descr => 'minimum value of all record input values',
+{ oid => '8596', prosupport => 'minmax_support', descr => 'minimum value of all record input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
-{ oid => '2245', descr => 'minimum value of all bpchar input values',
+{ oid => '2245', prosupport => 'minmax_support', descr => 'minimum value of all bpchar input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
-{ oid => '2798', descr => 'minimum value of all tid input values',
+{ oid => '2798', prosupport => 'minmax_support', descr => 'minimum value of all tid input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
-{ oid => '3565', descr => 'minimum value of all inet input values',
+{ oid => '3565', prosupport => 'minmax_support', descr => 'minimum value of all inet input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
-{ oid => '4190', descr => 'minimum value of all pg_lsn input values',
+{ oid => '4190', prosupport => 'minmax_support', descr => 'minimum value of all pg_lsn input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
-{ oid => '5100', descr => 'minimum value of all xid8 input values',
+{ oid => '5100', prosupport => 'minmax_support', descr => 'minimum value of all xid8 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
-{ oid => '8923', descr => 'minimum value of all bytea input values',
+{ oid => '8923', prosupport => 'minmax_support', descr => 'minimum value of all bytea input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bytea',
   proargtypes => 'bytea', prosrc => 'aggregate_dummy' },
 
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index 5f7bcde891..5b9a14c855 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -343,4 +343,11 @@ typedef struct SupportRequestOptimizeWindowClause
 								 * optimizations are possible. */
 } SupportRequestOptimizeWindowClause;
 
+typedef struct SupportRequestAggregate
+{
+	NodeTag		type;
+
+	Aggref	   *aggref;
+} SupportRequestAggregate;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index f2fb66388c..cd5cac4f52 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,135 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE TABLE grouping_prosupport_test (x integer);
+SET enable_seqscan = 'off'; -- Avoid time consuming data generation
+CREATE INDEX idx_int4 ON grouping_prosupport_test (x my_int_ops);
+VACUUM ANALYZE grouping_prosupport_test;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @<@ )
+FROM grouping_prosupport_test;
+                                        QUERY PLAN                                        
+------------------------------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: grouping_prosupport_test.x
+           ->  Index Only Scan Backward using idx_int4 on public.grouping_prosupport_test
+                 Output: grouping_prosupport_test.x
+                 Index Cond: (grouping_prosupport_test.x IS NOT NULL)
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @>@ )
+FROM grouping_prosupport_test;
+                                        QUERY PLAN                                        
+------------------------------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: grouping_prosupport_test.x
+           ->  Index Only Scan Backward using idx_int4 on public.grouping_prosupport_test
+                 Output: grouping_prosupport_test.x
+                 Index Cond: (grouping_prosupport_test.x IS NOT NULL)
+(8 rows)
+
+RESET enable_seqscan;
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+NOTICE:  drop cascades to index idx_int4
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+DROP TABLE grouping_prosupport_test;
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
@@ -1573,7 +1702,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1602,7 +1731,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1618,7 +1747,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: ten
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 77168bcc74..8dd8503bb8 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,68 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE TABLE grouping_prosupport_test (x integer);
+SET enable_seqscan = 'off'; -- Avoid time consuming data generation
+CREATE INDEX idx_int4 ON grouping_prosupport_test (x my_int_ops);
+VACUUM ANALYZE grouping_prosupport_test;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @<@ )
+FROM grouping_prosupport_test;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @>@ )
+FROM grouping_prosupport_test;
+
+RESET enable_seqscan;
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+DROP TABLE grouping_prosupport_test;
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e1c4f913f8..53b5136538 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2793,6 +2793,7 @@ SubscriptionRelState
 SummarizerReadLocalXLogPrivate
 SupportRequestCost
 SupportRequestIndexCondition
+SupportRequestAggregate
 SupportRequestOptimizeWindowClause
 SupportRequestRows
 SupportRequestSelectivity
-- 
2.39.5

#24

Andrei Lepikhov

lepihov@gmail.com

10 months ago

In reply to: Andrei Lepikhov (#23)

1 attachment(s)

Re: Expand applicability of aggregate's sortop optimization

Rebase on current master.

--
regards, Andrei Lepikhov

Attachments:

v5-0001-Add-prosupport-helpers-to-aggregate-functions.patchtext/plain; charset=UTF-8; name=v5-0001-Add-prosupport-helpers-to-aggregate-functions.patchDownload

From 79dee863be9c3f400c04d74f4e8493c8929eefbe Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" <lepihov@gmail.com>
Date: Tue, 4 Mar 2025 15:24:28 +0100
Subject: [PATCH v5] Add prosupport helpers to aggregate functions.

Following the spirit of ed1a88d, add a prosupport call for aggregates.
Here, we introduce a new support function node type to allow support functions
to be called for aggregate functions.
This code introduces only min/max trivial optimisation in the core. However, a
user can design alternative support functions on their own.
---
 src/backend/optimizer/plan/planagg.c     |  76 +++++++++++++
 src/backend/optimizer/prep/prepagg.c     |  26 +++++
 src/include/catalog/pg_proc.dat          |  92 +++++++--------
 src/include/nodes/supportnodes.h         |   7 ++
 src/test/regress/expected/aggregates.out | 135 ++++++++++++++++++++++-
 src/test/regress/sql/aggregates.sql      |  62 +++++++++++
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 352 insertions(+), 47 deletions(-)

diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 64605be3178..2a9823279df 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/pathnode.h"
@@ -43,6 +44,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/fmgrprotos.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -512,3 +514,77 @@ fetch_agg_sort_op(Oid aggfnoid)
 
 	return aggsortop;
 }
+
+Datum
+minmax_support(PG_FUNCTION_ARGS)
+{
+	Node   *rawreq = (Node *) PG_GETARG_POINTER(0);
+	Oid		aggsortop;
+
+	if (IsA(rawreq, SupportRequestAggregate))
+	{
+		SupportRequestAggregate   *req = (SupportRequestAggregate *) rawreq;
+		Aggref				   *aggref = req->aggref;
+
+		if (list_length(aggref->args) != 1 || aggref->aggfilter != NULL)
+			PG_RETURN_POINTER(NULL);
+
+		aggsortop = fetch_agg_sort_op(aggref->aggfnoid);
+		if (!OidIsValid(aggsortop))
+			PG_RETURN_POINTER(NULL);		/* not a MIN/MAX aggregate */
+
+		if (aggref->aggorder != NIL)
+		{
+			SortGroupClause	   *orderClause;
+			TargetEntry		   *curTarget;
+
+			curTarget = (TargetEntry *) linitial(aggref->args);
+
+			/*
+			 * If the order clause is the same column as the one we're
+			 * aggregating, we can still use the index: It is undefined which
+			 * value is MIN() or MAX(), as well as which value is first or
+			 * last when sorted. So, we can still use the index IFF the
+			 * aggregated expression equals the expression used in the
+			 * ordering operation.
+			 */
+
+			/*
+			 * We only accept a single argument to min/max aggregates,
+			 * orderings that have more clauses won't provide correct results.
+			 */
+			Assert(list_length(aggref->aggorder) == 1);
+
+			orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));
+
+			if (orderClause->tleSortGroupRef != curTarget->ressortgroupref)
+				elog(ERROR, "Aggregate order clause isn't found in target list");
+
+			if (orderClause->sortop != aggsortop)
+			{
+				List	   *btclasses;
+				ListCell   *lc;
+
+				btclasses = get_op_btree_interpretation(orderClause->sortop);
+
+				foreach(lc, btclasses)
+				{
+					OpBtreeInterpretation *interpretation;
+
+					interpretation = (OpBtreeInterpretation *) lfirst(lc);
+					if (op_in_opfamily(aggsortop, interpretation->opfamily_id))
+					{
+						aggref->aggorder = NIL;
+						break;
+					}
+				}
+
+				list_free_deep(btclasses);
+			}
+			else
+				aggref->aggorder = NIL;
+		}
+	}
+
+	PG_RETURN_POINTER(NULL);
+}
diff --git a/src/backend/optimizer/prep/prepagg.c b/src/backend/optimizer/prep/prepagg.c
index c0a2f04a8c3..52af0ba79db 100644
--- a/src/backend/optimizer/prep/prepagg.c
+++ b/src/backend/optimizer/prep/prepagg.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "nodes/pathnodes.h"
+#include "nodes/supportnodes.h"
 #include "optimizer/cost.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/plancat.h"
@@ -64,6 +65,25 @@ static int	find_compatible_trans(PlannerInfo *root, Aggref *newagg,
 								  List *transnos);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 
+static void
+optimize_aggregates(Aggref *aggref)
+{
+	SupportRequestAggregate	req;
+	Oid						prosupport;
+
+	prosupport = get_func_support(aggref->aggfnoid);
+
+	/* Check if there's a support function for the aggregate */
+	if (!OidIsValid(prosupport))
+		return;
+
+
+	req.type = T_SupportRequestAggregate;
+	req.aggref = aggref;
+	/* call the support function */
+	(void) OidFunctionCall1(prosupport, PointerGetDatum(&req));
+}
+
 /* -----------------
  * Resolve the transition type of all Aggrefs, and determine which Aggrefs
  * can share aggregate or transition state.
@@ -215,6 +235,12 @@ preprocess_aggref(Aggref *aggref, PlannerInfo *root)
 
 	ReleaseSysCache(aggTuple);
 
+	/*
+	 * See if any modifications can be made to each aggregate to allow
+	 * planner to process it in more effective way.
+	 */
+	optimize_aggregates(aggref);
+
 	/*
 	 * 1. See if this is identical to another aggregate function call that
 	 * we've seen already.
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bac..fe5a0dc9fb0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6859,155 +6859,159 @@
   proname => 'sum', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
 
-{ oid => '2115', descr => 'maximum value of all bigint input values',
+{ oid => '2775', descr => 'planner support for min and max aggregates',
+  proname => 'minmax_support', prorettype => 'internal',
+  proargtypes => 'internal', prosrc => 'minmax_support' },
+{ oid => '2115', prosupport => 'minmax_support',
+  descr => 'maximum value of all bigint input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
-{ oid => '2116', descr => 'maximum value of all integer input values',
+{ oid => '2116', prosupport => 'minmax_support', descr => 'maximum value of all integer input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
-{ oid => '2117', descr => 'maximum value of all smallint input values',
+{ oid => '2117', prosupport => 'minmax_support', descr => 'maximum value of all smallint input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
-{ oid => '2118', descr => 'maximum value of all oid input values',
+{ oid => '2118', prosupport => 'minmax_support', descr => 'maximum value of all oid input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
-{ oid => '2119', descr => 'maximum value of all float4 input values',
+{ oid => '2119', prosupport => 'minmax_support', descr => 'maximum value of all float4 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
-{ oid => '2120', descr => 'maximum value of all float8 input values',
+{ oid => '2120', prosupport => 'minmax_support', descr => 'maximum value of all float8 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
-{ oid => '2122', descr => 'maximum value of all date input values',
+{ oid => '2122', prosupport => 'minmax_support', descr => 'maximum value of all date input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
-{ oid => '2123', descr => 'maximum value of all time input values',
+{ oid => '2123', prosupport => 'minmax_support', descr => 'maximum value of all time input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
-{ oid => '2124',
+{ oid => '2124', prosupport => 'minmax_support',
   descr => 'maximum value of all time with time zone input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
-{ oid => '2125', descr => 'maximum value of all money input values',
+{ oid => '2125', prosupport => 'minmax_support', descr => 'maximum value of all money input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
-{ oid => '2126', descr => 'maximum value of all timestamp input values',
+{ oid => '2126', prosupport => 'minmax_support', descr => 'maximum value of all timestamp input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
-{ oid => '2127',
+{ oid => '2127', prosupport => 'minmax_support',
   descr => 'maximum value of all timestamp with time zone input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
-{ oid => '2128', descr => 'maximum value of all interval input values',
+{ oid => '2128', prosupport => 'minmax_support', descr => 'maximum value of all interval input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
-{ oid => '2129', descr => 'maximum value of all text input values',
+{ oid => '2129', prosupport => 'minmax_support', descr => 'maximum value of all text input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2130', descr => 'maximum value of all numeric input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
-{ oid => '2050', descr => 'maximum value of all anyarray input values',
+{ oid => '2050', prosupport => 'minmax_support', descr => 'maximum value of all anyarray input values',
   proname => 'max', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
-{ oid => '8595', descr => 'maximum value of all record input values',
+{ oid => '8595', prosupport => 'minmax_support', descr => 'maximum value of all record input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
-{ oid => '2244', descr => 'maximum value of all bpchar input values',
+{ oid => '2244', prosupport => 'minmax_support', descr => 'maximum value of all bpchar input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
-{ oid => '2797', descr => 'maximum value of all tid input values',
+{ oid => '2797', prosupport => 'minmax_support', descr => 'maximum value of all tid input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
-{ oid => '3564', descr => 'maximum value of all inet input values',
+{ oid => '3564', prosupport => 'minmax_support', descr => 'maximum value of all inet input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
-{ oid => '4189', descr => 'maximum value of all pg_lsn input values',
+{ oid => '4189', prosupport => 'minmax_support', descr => 'maximum value of all pg_lsn input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
-{ oid => '5099', descr => 'maximum value of all xid8 input values',
+{ oid => '5099', prosupport => 'minmax_support', descr => 'maximum value of all xid8 input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
-{ oid => '8922', descr => 'maximum value of all bytea input values',
+{ oid => '8922', prosupport => 'minmax_support', descr => 'maximum value of all bytea input values',
   proname => 'max', prokind => 'a', proisstrict => 'f', prorettype => 'bytea',
   proargtypes => 'bytea', prosrc => 'aggregate_dummy' },
 
-{ oid => '2131', descr => 'minimum value of all bigint input values',
+{ oid => '2131', prosupport => 'minmax_support', descr => 'minimum value of all bigint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int8',
   proargtypes => 'int8', prosrc => 'aggregate_dummy' },
-{ oid => '2132', descr => 'minimum value of all integer input values',
+{ oid => '2132', prosupport => 'minmax_support', descr => 'minimum value of all integer input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int4',
   proargtypes => 'int4', prosrc => 'aggregate_dummy' },
-{ oid => '2133', descr => 'minimum value of all smallint input values',
+{ oid => '2133', prosupport => 'minmax_support', descr => 'minimum value of all smallint input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'int2',
   proargtypes => 'int2', prosrc => 'aggregate_dummy' },
-{ oid => '2134', descr => 'minimum value of all oid input values',
+{ oid => '2134', prosupport => 'minmax_support', descr => 'minimum value of all oid input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'oid',
   proargtypes => 'oid', prosrc => 'aggregate_dummy' },
-{ oid => '2135', descr => 'minimum value of all float4 input values',
+{ oid => '2135', prosupport => 'minmax_support', descr => 'minimum value of all float4 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float4',
   proargtypes => 'float4', prosrc => 'aggregate_dummy' },
-{ oid => '2136', descr => 'minimum value of all float8 input values',
+{ oid => '2136', prosupport => 'minmax_support', descr => 'minimum value of all float8 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'float8',
   proargtypes => 'float8', prosrc => 'aggregate_dummy' },
-{ oid => '2138', descr => 'minimum value of all date input values',
+{ oid => '2138', prosupport => 'minmax_support', descr => 'minimum value of all date input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'date',
   proargtypes => 'date', prosrc => 'aggregate_dummy' },
-{ oid => '2139', descr => 'minimum value of all time input values',
+{ oid => '2139', prosupport => 'minmax_support', descr => 'minimum value of all time input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'time',
   proargtypes => 'time', prosrc => 'aggregate_dummy' },
-{ oid => '2140',
+{ oid => '2140', prosupport => 'minmax_support',
   descr => 'minimum value of all time with time zone input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'timetz',
   proargtypes => 'timetz', prosrc => 'aggregate_dummy' },
-{ oid => '2141', descr => 'minimum value of all money input values',
+{ oid => '2141', prosupport => 'minmax_support', descr => 'minimum value of all money input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'money',
   proargtypes => 'money', prosrc => 'aggregate_dummy' },
-{ oid => '2142', descr => 'minimum value of all timestamp input values',
+{ oid => '2142', prosupport => 'minmax_support', descr => 'minimum value of all timestamp input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamp', proargtypes => 'timestamp',
   prosrc => 'aggregate_dummy' },
-{ oid => '2143',
+{ oid => '2143', prosupport => 'minmax_support',
   descr => 'minimum value of all timestamp with time zone input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'timestamptz', proargtypes => 'timestamptz',
   prosrc => 'aggregate_dummy' },
-{ oid => '2144', descr => 'minimum value of all interval input values',
+{ oid => '2144', prosupport => 'minmax_support', descr => 'minimum value of all interval input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'interval', proargtypes => 'interval',
   prosrc => 'aggregate_dummy' },
-{ oid => '2145', descr => 'minimum value of all text values',
+{ oid => '2145', prosupport => 'minmax_support', descr => 'minimum value of all text values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'text',
   proargtypes => 'text', prosrc => 'aggregate_dummy' },
 { oid => '2146', descr => 'minimum value of all numeric input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'numeric',
   proargtypes => 'numeric', prosrc => 'aggregate_dummy' },
-{ oid => '2051', descr => 'minimum value of all anyarray input values',
+{ oid => '2051', prosupport => 'minmax_support', descr => 'minimum value of all anyarray input values',
   proname => 'min', prokind => 'a', proisstrict => 'f',
   prorettype => 'anyarray', proargtypes => 'anyarray',
   prosrc => 'aggregate_dummy' },
-{ oid => '8596', descr => 'minimum value of all record input values',
+{ oid => '8596', prosupport => 'minmax_support', descr => 'minimum value of all record input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'record',
   proargtypes => 'record', prosrc => 'aggregate_dummy' },
-{ oid => '2245', descr => 'minimum value of all bpchar input values',
+{ oid => '2245', prosupport => 'minmax_support', descr => 'minimum value of all bpchar input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bpchar',
   proargtypes => 'bpchar', prosrc => 'aggregate_dummy' },
-{ oid => '2798', descr => 'minimum value of all tid input values',
+{ oid => '2798', prosupport => 'minmax_support', descr => 'minimum value of all tid input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'tid',
   proargtypes => 'tid', prosrc => 'aggregate_dummy' },
-{ oid => '3565', descr => 'minimum value of all inet input values',
+{ oid => '3565', prosupport => 'minmax_support', descr => 'minimum value of all inet input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'inet',
   proargtypes => 'inet', prosrc => 'aggregate_dummy' },
-{ oid => '4190', descr => 'minimum value of all pg_lsn input values',
+{ oid => '4190', prosupport => 'minmax_support', descr => 'minimum value of all pg_lsn input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'pg_lsn',
   proargtypes => 'pg_lsn', prosrc => 'aggregate_dummy' },
-{ oid => '5100', descr => 'minimum value of all xid8 input values',
+{ oid => '5100', prosupport => 'minmax_support', descr => 'minimum value of all xid8 input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'xid8',
   proargtypes => 'xid8', prosrc => 'aggregate_dummy' },
-{ oid => '8923', descr => 'minimum value of all bytea input values',
+{ oid => '8923', prosupport => 'minmax_support', descr => 'minimum value of all bytea input values',
   proname => 'min', prokind => 'a', proisstrict => 'f', prorettype => 'bytea',
   proargtypes => 'bytea', prosrc => 'aggregate_dummy' },
 
diff --git a/src/include/nodes/supportnodes.h b/src/include/nodes/supportnodes.h
index 9c047cc401b..370be4b6770 100644
--- a/src/include/nodes/supportnodes.h
+++ b/src/include/nodes/supportnodes.h
@@ -390,4 +390,11 @@ typedef struct SupportRequestModifyInPlace
 	int			paramid;		/* ID of Param(s) representing variable */
 } SupportRequestModifyInPlace;
 
+typedef struct SupportRequestAggregate
+{
+	NodeTag		type;
+
+	Aggref	   *aggref;
+} SupportRequestAggregate;
+
 #endif							/* SUPPORTNODES_H */
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 8c4f8ce27ed..302d350e481 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -1003,6 +1003,135 @@ select max(unique1) from tenk1 where unique1 > 42;
  9999
 (1 row)
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+ min 
+-----
+   0
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Result
+   InitPlan 1
+     ->  Limit
+           ->  Index Only Scan Backward using tenk1_unique1 on tenk1
+                 Index Cond: (unique1 IS NOT NULL)
+(5 rows)
+
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+          QUERY PLAN           
+-------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
+select max(unique1 ORDER BY tenthous) from tenk1;
+ max  
+------
+ 9999
+(1 row)
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE TABLE grouping_prosupport_test (x integer);
+SET enable_seqscan = 'off'; -- Avoid time consuming data generation
+CREATE INDEX idx_int4 ON grouping_prosupport_test (x my_int_ops);
+VACUUM ANALYZE grouping_prosupport_test;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @<@ )
+FROM grouping_prosupport_test;
+                                        QUERY PLAN                                        
+------------------------------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: grouping_prosupport_test.x
+           ->  Index Only Scan Backward using idx_int4 on public.grouping_prosupport_test
+                 Output: grouping_prosupport_test.x
+                 Index Cond: (grouping_prosupport_test.x IS NOT NULL)
+(8 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @>@ )
+FROM grouping_prosupport_test;
+                                        QUERY PLAN                                        
+------------------------------------------------------------------------------------------
+ Result
+   Output: (InitPlan 1).col1
+   InitPlan 1
+     ->  Limit
+           Output: grouping_prosupport_test.x
+           ->  Index Only Scan Backward using idx_int4 on public.grouping_prosupport_test
+                 Output: grouping_prosupport_test.x
+                 Index Cond: (grouping_prosupport_test.x IS NOT NULL)
+(8 rows)
+
+RESET enable_seqscan;
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+NOTICE:  drop cascades to index idx_int4
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+DROP TABLE grouping_prosupport_test;
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+             QUERY PLAN              
+-------------------------------------
+ Aggregate
+   ->  Sort
+         Sort Key: unique1, tenthous
+         ->  Seq Scan on tenk1
+(4 rows)
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
@@ -1573,7 +1702,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1602,7 +1731,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: four
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
@@ -1618,7 +1747,7 @@ from tenk1;
 -------------------------------
  Aggregate
    ->  Sort
-         Sort Key: ten
+         Sort Key: two
          ->  Seq Scan on tenk1
 (4 rows)
 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index a1dc94bff57..f09133097ae 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -368,6 +368,68 @@ explain (costs off)
   select max(unique1) from tenk1 where unique1 > 42;
 select max(unique1) from tenk1 where unique1 > 42;
 
+-- When sorting on the column that's being aggregated, indexes can also be
+-- used, but only when the aggregate's operator has the same ordering behavior
+-- as the ORDER BY-clause, i.e. if it is in the same btree opclass as the one
+-- chosen for the ORDER BY clause.
+explain (costs off)
+  select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+select min(unique1 ORDER BY unique1 ASC NULLS LAST) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY unique1 USING <) from tenk1;
+select max(unique1 ORDER BY unique1 USING <) from tenk1;
+explain (costs off)
+  select max(unique1 ORDER BY tenthous) from tenk1;
+select max(unique1 ORDER BY tenthous) from tenk1;
+
+CREATE OPERATOR @>@ (function=int4gt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @<@ (function=int4lt, leftarg=int4, rightarg=int4);
+CREATE OPERATOR @=@ (function=int4eq, leftarg=int4, rightarg=int4);
+
+CREATE OPERATOR FAMILY my_family USING btree;
+CREATE OPERATOR CLASS my_int_ops
+  FOR TYPE int
+  USING btree FAMILY my_family AS
+  OPERATOR 1 @<@ FOR SEARCH,
+  OPERATOR 5 @>@ FOR SEARCH,
+  OPERATOR 3 @=@,
+  FUNCTION 1 btint4cmp;
+
+CREATE AGGREGATE my_int_max (
+  BASETYPE = int4,
+  SFUNC = int4larger,
+  STYPE = int4,
+  SORTOP = @>@
+);
+
+-- Dirty hack in absence of prosupport declaration in the 'CREATE AGGREGATE'
+-- statement.
+UPDATE pg_proc SET prosupport = 'minmax_support'::regproc
+  WHERE proname = 'my_int_max';
+CREATE TABLE grouping_prosupport_test (x integer);
+SET enable_seqscan = 'off'; -- Avoid time consuming data generation
+CREATE INDEX idx_int4 ON grouping_prosupport_test (x my_int_ops);
+VACUUM ANALYZE grouping_prosupport_test;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @<@ )
+FROM grouping_prosupport_test;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT my_int_max(x::int4 ORDER BY x::int4 USING @>@ )
+FROM grouping_prosupport_test;
+
+RESET enable_seqscan;
+DROP AGGREGATE my_int_max(int4);
+DROP OPERATOR CLASS my_int_ops USING btree CASCADE;
+DROP OPERATOR FAMILY my_family USING btree;
+DROP OPERATOR @>@ (int4, int4);
+DROP OPERATOR @<@ (int4, int4);
+DROP OPERATOR @=@ (int4, int4);
+DROP TABLE grouping_prosupport_test;
+
+--  But even then, the index can't be used if we order by multiple columns.
+explain (costs off)
+  select max(unique1 ORDER BY unique1, tenthous) from tenk1;
+
 -- the planner may choose a generic aggregate here if parallel query is
 -- enabled, since that plan will be parallel safe and the "optimized"
 -- plan, which has almost identical cost, will not be.  we want to test
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 19ff271ba50..e1166133571 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2823,6 +2823,7 @@ Subscription
 SubscriptionInfo
 SubscriptionRelState
 SummarizerReadLocalXLogPrivate
+SupportRequestAggregate
 SupportRequestCost
 SupportRequestIndexCondition
 SupportRequestModifyInPlace
-- 
2.48.1

#25

Tom Lane

tgl@sss.pgh.pa.us

9 months ago

In reply to: Andrei Lepikhov (#24)

Re: Expand applicability of aggregate's sortop optimization

Andrei Lepikhov <lepihov@gmail.com> writes:

Rebase on current master.

I started to look at v5, and was immediately put off by the fact
that its documentation of the new support request type consists of
precisely zero words. You have the wrong idea of what is required
when adding a support request. The specification of what the request
does and what it returns is your PRIMARY work product, not something
you can just blow off and expect people to reverse-engineer over and
over again in the future. Please look at the existing entries in
supportnodes.h and note that they all have quite specific and detailed
comments.

minmax_support() is pretty short on comments as well. The one comment
that's there is unintelligible because it starts out by talking about
an index ... the reader is likely to say "what index?" Also, I am
far from convinced that the code is right: the fact that the aggsortop
shares a btree opfamily with orderClause->sortop doesn't seem to be
enough to justify removing the aggorder clause. What if they are
opposite sort directions?

Actually ... even if those operators do match, is it okay to just
remove the aggorder clause? What happens if we do not end up using
an index-based implementation? The code really needs to include
a defense of why it's okay to do what it's doing.

Also, that one comment says "So, we can still use the index IFF the
aggregated expression equals the expression used in the ordering
operation". But I see no check that they match.

I kind of wonder whether the case that this is attempting to optimize
actually occurs in the field often enough to justify expending this
many cycles looking for it. MIN(x ORDER BY x) does not seem like
something that a rational person would write.

BTW, it looks to me like that Assert would fail on MIN(x ORDER BY x,y)
which is even less sensible to write, but that doesn't make an
assertion failure okay.

I wonder whether you picked the best spot to insert the hook call
in preprocess_aggref. In particular, the hook can do nothing that
would affect the polymorphic-result-type resolution step, which
is maybe not a restriction in practice but I'm not entirely sure.
I'm a little inclined to call it before we start digging information
out of the Aggref, rather than after.

regards, tom lane

#26

Matthias van de Meent

boekewurm+postgres@gmail.com

6 months ago

In reply to: Andrei Lepikhov (#24)

Re: Expand applicability of aggregate's sortop optimization

On Tue, 4 Mar 2025 at 15:26, Andrei Lepikhov <lepihov@gmail.com> wrote:

Rebase on current master.

FYI, as this is a significantly different patch than the one I started
this thread with, I'm closing my CF entry. Feel free to open a new one
for your patch.

Kind regards,

Matthias van de Meent