Add enable_groupagg GUC parameter to control GroupAggregate usage
Hi hackers,
When I measured the execution time of a certain query with parallel query
enabled and disabled, I found that the execution time was slower when
parallel query was enabled.
To improve the performance of the parallel query, I considered adjusting
the execution plan and attempted to switch from GroupAggregate to
HashAggregate. However, I noticed that there was no GUC parameter to
disable GroupAggregate.
Therefore, I propose adding a new GUC parameter: enable_groupagg.
Below are the results of a performance test where I disabled
GroupAggregate using enable_groupagg. In this case, the planner chose
HashAggregate instead, which improved performance by about 35 times.
# Query Execution Results (Average of 3 measurements)
- With parallel query: 39546 seconds
- With parallel query and enable_groupagg turned off: 1115 seconds
# Query and Data Used (attached to this email)
- Query: test_query.sql
- Data: create_table.sql
# The steps to run the test are as follows.
For example, on psql:
1. Create tables:
\i create_table.sql
2. Execute a query:
\i test_query.sql
3. Execute a query using the new GUC parameter:
set enable_groupagg to off;
\i test_query.sql
As a benefit to users, while there has previously been a GUC parameter
to control HashAggregate, there was no corresponding way to control
GroupAggregate. This patch addresses that, giving users more flexibility
in tuning execution plans.
I've attached a WIP patch that adds this GUC parameter. I would
appreciate any feedback, especially regarding how many test cases I
should create.
To create new test cases for enable_groupagg, I looked into existing
test cases that use enable_hashagg and found that it is used in many
places (62 places). Should I add a test case for enable_groupagg in
the same place as enable_hashagg? I think that adding a new feature
requires a minimum number of test cases, so I would appreciate your
advice.
Additionally, based on the execution plan, I suspect the slowdown in the
parallel query might be caused by misestimates related to Sort or
Gather Merge.
While resolving those misestimates would ideally improve the root issue,
I'd like to keep the focus of this thread on adding the GUC parameter.
Then, I plan to report or address the estimation problem in a separate
thread.
Thanks,
Tatsuro Yamada
Attachments:
0001-Add-new-GUC-parameter-enable_groupagg-WIP-r1.patchapplication/octet-stream; name=0001-Add-new-GUC-parameter-enable_groupagg-WIP-r1.patchDownload
From 461c64770e612661d974acc0fd39815108781580 Mon Sep 17 00:00:00 2001
From: Tatsuro Yamada <yamatattsu@gmail.com>
Date: Thu, 5 Jun 2025 18:50:34 +0900
Subject: [PATCH] Add new GUC parameter: enable_groupagg
Previously, there was no GUC parameter to control the use of
GroupAggregate, so we couldn't influence the planner's choice
in certain queries.
This patch adds a new parameter, "enable_groupagg", which
allows users to enable or disable GroupAggregate explicitly.
By disabling GroupAggregate, the planner may choose
HashAggregate instead, potentially resulting in a more
efficient execution plan for some queries.
---
doc/src/sgml/config.sgml | 14 ++++++++++++++
src/backend/optimizer/path/costsize.c | 3 +++
src/backend/utils/misc/guc_tables.c | 10 ++++++++++
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/optimizer/cost.h | 1 +
src/test/regress/expected/sysviews.out | 3 ++-
6 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 021153b2a5f..edd0f3a13b8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5515,6 +5515,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-groupagg" xreflabel="enable_groupagg">
+ <term><varname>enable_groupagg</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_groupagg</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's use of grouped
+ aggregation plan types. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-hashjoin" xreflabel="enable_hashjoin">
<term><varname>enable_hashjoin</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3d44815ed5a..80c68008d85 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -150,6 +150,7 @@ bool enable_tidscan = true;
bool enable_sort = true;
bool enable_incremental_sort = true;
bool enable_hashagg = true;
+bool enable_groupagg = true;
bool enable_nestloop = true;
bool enable_material = true;
bool enable_memoize = true;
@@ -2737,6 +2738,8 @@ cost_agg(Path *path, PlannerInfo *root,
/* Here we are able to deliver output on-the-fly */
startup_cost = input_startup_cost;
total_cost = input_total_cost;
+ if (aggstrategy == AGG_SORTED && !enable_groupagg && enable_hashagg)
+ ++disabled_nodes;
if (aggstrategy == AGG_MIXED && !enable_hashagg)
++disabled_nodes;
/* calcs phrased this way to match HASHED case, see note above */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f04bfedb2fd..a17b7fb1ba1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -869,6 +869,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_groupagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of grouped aggregation plans."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_groupagg,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 341f88adc87..0514c327767 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -408,6 +408,7 @@
#enable_bitmapscan = on
#enable_gathermerge = on
#enable_hashagg = on
+#enable_groupagg = on
#enable_hashjoin = on
#enable_incremental_sort = on
#enable_indexscan = on
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d397fe27dc1..099d41fd7bd 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_tidscan;
extern PGDLLIMPORT bool enable_sort;
extern PGDLLIMPORT bool enable_incremental_sort;
extern PGDLLIMPORT bool enable_hashagg;
+extern PGDLLIMPORT bool enable_groupagg;
extern PGDLLIMPORT bool enable_nestloop;
extern PGDLLIMPORT bool enable_material;
extern PGDLLIMPORT bool enable_memoize;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..f10371d6e26 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,6 +153,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_distinct_reordering | on
enable_gathermerge | on
enable_group_by_reordering | on
+ enable_groupagg | on
enable_hashagg | on
enable_hashjoin | on
enable_incremental_sort | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
--
2.43.5
Hi,
# Query Execution Results (Average of 3 measurements)
- With parallel query: 39546 seconds
- With parallel query and enable_groupagg turned off: 1115 seconds
Oops, I made a mistake. The correct execution time is:
- With parallel query: 39546 ms
- With parallel query and enable_groupagg turned off: 1115 ms
Regards,
Tatsuro Yamada
On Fri, Jun 6, 2025 at 1:29 PM Tatsuro Yamada <yamatattsu@gmail.com> wrote:
Hi hackers,
When I measured the execution time of a certain query with parallel query
enabled and disabled, I found that the execution time was slower when
parallel query was enabled.To improve the performance of the parallel query, I considered adjusting
the execution plan and attempted to switch from GroupAggregate to
HashAggregate. However, I noticed that there was no GUC parameter to
disable GroupAggregate.Therefore, I propose adding a new GUC parameter: enable_groupagg.
Below are the results of a performance test where I disabled
GroupAggregate using enable_groupagg. In this case, the planner chose
HashAggregate instead, which improved performance by about 35 times.# Query Execution Results (Average of 3 measurements)
- With parallel query: 39546 seconds
- With parallel query and enable_groupagg turned off: 1115 seconds# Query and Data Used (attached to this email)
- Query: test_query.sql
- Data: create_table.sql# The steps to run the test are as follows.
For example, on psql:1. Create tables:
\i create_table.sql2. Execute a query:
\i test_query.sql3. Execute a query using the new GUC parameter:
set enable_groupagg to off;
\i test_query.sqlAs a benefit to users, while there has previously been a GUC parameter
to control HashAggregate, there was no corresponding way to control
GroupAggregate. This patch addresses that, giving users more flexibility
in tuning execution plans.I've attached a WIP patch that adds this GUC parameter. I would
appreciate any feedback, especially regarding how many test cases I
should create.
I first thought enable_hashagg should be sufficient to choose one strategy
over the other. But that is not true, enable_hashagg = true allows both the
strategies, enable_hashagg = false disables just hash strategy. There's no
way to disable group agg alone. So I think it makes sense to have this GUC.
I am surprised that we didn't see this being a problem for so long.
We seem to disable mixed strategy when enable_hashagg is false. Do we want
to do the same when enable_groupagg = false as well?
To create new test cases for enable_groupagg, I looked into existing
test cases that use enable_hashagg and found that it is used in many
places (62 places). Should I add a test case for enable_groupagg in
the same place as enable_hashagg? I think that adding a new feature
requires a minimum number of test cases, so I would appreciate your
advice.
Some of those instances are for plan stability, all of which need not be
replicated. But some of them explicitly test sort based grouping. For rest
of them hash based plan seems to be the best one, so explicit
enable_groupagg = false is not needed. We will need some test to test the
switch though.
Additionally, based on the execution plan, I suspect the slowdown in the
parallel query might be caused by misestimates related to Sort or
Gather Merge.
While resolving those misestimates would ideally improve the root issue,
I'd like to keep the focus of this thread on adding the GUC parameter.
Then, I plan to report or address the estimation problem in a separate
thread.
+1.
--
Best Wishes,
Ashutosh Bapat
Hi Ashutosh,
Thanks for your reply!
I first thought enable_hashagg should be sufficient to choose one strategy
over the other. But that is not true, enable_hashagg = true allows both the
strategies, enable_hashagg = false disables just hash strategy. There's no
way to disable group agg alone. So I think it makes sense to have this GUC.
Yes, exactly! Thank you for the clarification.
I am surprised that we didn't see this being a problem for so long.
I was also wondering why this GUC parameter hadn't been introduced
earlier.
We seem to disable mixed strategy when enable_hashagg is false. Do we want
to do the same when enable_groupagg = false as well?
As I mentioned in my previous email, setting enable_groupagg = false
resulted in better execution time compared to the mixed strategy (which,
as I understand, includes both HashAgg and GroupAgg). So, I believe
this new GUC parameter would be helpful for users in certain situations.
Here’s how the current and proposed behaviors compare:
Current behavior:
enable_hashagg = ON → Mixed
enable_hashagg = OFF → GroupAgg
After applying the patch:
enable_hashagg = ON, enable_groupagg = ON → Mixed
enable_hashagg = OFF, enable_groupagg = ON → GroupAgg
enable_hashagg = ON, enable_groupagg = OFF → HashAgg (new)
enable_hashagg = OFF, enable_groupagg = OFF → GroupAgg
In addition, if the both parameters = OFF, GroupAgg will be selected in
the patch. The reason is that I found a comment that HashAgg might use
too much memory, so I decided to prioritize using GroupAgg, which is
more secure.
In the last case, I chose to default to GroupAgg since I found a
comment suggesting HashAgg might consume excessive memory, so GroupAgg
seemed the safer fallback.
Some hackers may want to compare the actual execution plans and times
under different GUC settings, so I've attached my test results in
"test_result.txt".
Some of those instances are for plan stability, all of which need not be
replicated. But some of them explicitly test sort based grouping. For rest
of them hash based plan seems to be the best one, so explicit
enable_groupagg = false is not needed. We will need some test to test the
switch though.
Thanks for your advice. I'll create a regression test and send a new patch
to -hackers in my next email.
Regards,
Tatsuro Yamada
Attachments:
Hi Ashtosh and hackers,
Some of those instances are for plan stability, all of which need not be
replicated. But some of them explicitly test sort based grouping. For rest
of them hash based plan seems to be the best one, so explicit
enable_groupagg = false is not needed. We will need some test to test the
switch though.Thanks for your advice. I'll create a regression test and send a new patch
to -hackers in my next email.
I created a regression test to check the enable_groupagg parameter in
the new patch.
To ensure plan stability, I disabled parallel query by setting the
max_parallel_*
parameters to 0.
Any feedback is welcome.
Please see the attached file.
Thanks,
Tatsuro Yamada
Attachments:
0001-Add-new-GUC-parameter-enable_groupagg-WIP-r2.patchapplication/octet-stream; name=0001-Add-new-GUC-parameter-enable_groupagg-WIP-r2.patchDownload
From 2d9798207cdbb1bbe7c1858e36d69bdb17103ecf Mon Sep 17 00:00:00 2001
From: Tatsuro Yamada <yamatattsu@gmail.com>
Date: Thu, 5 Jun 2025 18:50:34 +0900
Subject: [PATCH] Add new GUC parameter: enable_groupagg
Previously, there was no GUC parameter to control the use of
GroupAggregate, so we couldn't influence the planner's choice
in certain queries.
This patch adds a new parameter, "enable_groupagg", which
allows users to enable or disable GroupAggregate explicitly.
By disabling GroupAggregate, the planner may choose
HashAggregate instead, potentially resulting in a more
efficient execution plan for some queries.
---
doc/src/sgml/config.sgml | 14 ++
src/backend/optimizer/path/costsize.c | 3 +
src/backend/utils/misc/guc_tables.c | 10 ++
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/optimizer/cost.h | 1 +
src/test/regress/expected/aggregates.out | 120 ++++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/sql/aggregates.sql | 81 ++++++++++++
8 files changed, 232 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 021153b2a5f..edd0f3a13b8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5515,6 +5515,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-groupagg" xreflabel="enable_groupagg">
+ <term><varname>enable_groupagg</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_groupagg</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's use of grouped
+ aggregation plan types. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-hashjoin" xreflabel="enable_hashjoin">
<term><varname>enable_hashjoin</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 3d44815ed5a..80c68008d85 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -150,6 +150,7 @@ bool enable_tidscan = true;
bool enable_sort = true;
bool enable_incremental_sort = true;
bool enable_hashagg = true;
+bool enable_groupagg = true;
bool enable_nestloop = true;
bool enable_material = true;
bool enable_memoize = true;
@@ -2737,6 +2738,8 @@ cost_agg(Path *path, PlannerInfo *root,
/* Here we are able to deliver output on-the-fly */
startup_cost = input_startup_cost;
total_cost = input_total_cost;
+ if (aggstrategy == AGG_SORTED && !enable_groupagg && enable_hashagg)
+ ++disabled_nodes;
if (aggstrategy == AGG_MIXED && !enable_hashagg)
++disabled_nodes;
/* calcs phrased this way to match HASHED case, see note above */
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f04bfedb2fd..a17b7fb1ba1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -869,6 +869,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_groupagg", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables the planner's use of grouped aggregation plans."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_groupagg,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_material", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of materialization."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 341f88adc87..0514c327767 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -408,6 +408,7 @@
#enable_bitmapscan = on
#enable_gathermerge = on
#enable_hashagg = on
+#enable_groupagg = on
#enable_hashjoin = on
#enable_incremental_sort = on
#enable_indexscan = on
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index d397fe27dc1..099d41fd7bd 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -57,6 +57,7 @@ extern PGDLLIMPORT bool enable_tidscan;
extern PGDLLIMPORT bool enable_sort;
extern PGDLLIMPORT bool enable_incremental_sort;
extern PGDLLIMPORT bool enable_hashagg;
+extern PGDLLIMPORT bool enable_groupagg;
extern PGDLLIMPORT bool enable_nestloop;
extern PGDLLIMPORT bool enable_material;
extern PGDLLIMPORT bool enable_memoize;
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 1f1ce2380af..0911cb33c0a 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -3581,3 +3581,123 @@ drop table agg_hash_1;
drop table agg_hash_2;
drop table agg_hash_3;
drop table agg_hash_4;
+-- create table to check enable_groupagg
+CREATE TABLE test_groupagg(
+ id serial primary key,
+ c1 text,
+ c2 text,
+ c3 numeric);
+INSERT INTO test_groupagg (c1, c2, c3) VALUES
+('a', 'GGG', 100),
+('a', 'GGG', 150),
+('a', 'rrr', 200),
+('b', 'ooo', 300),
+('b', 'ooo', 250),
+('b', 'uuu', 100),
+('c', 'ppp', 500),
+('c', 'ppp', 600),
+('c', 'aaa', 550);
+ANALYZE;
+-- default: GroupAgg and HashAgg are mixed and selected
+SET max_parallel_workers to 0;
+SET max_parallel_workers_per_gather to 0;
+SET enable_hashagg to default;
+SET enable_groupagg to default;
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+ QUERY PLAN
+-----------------------------------------------------------------------------
+ GroupAggregate
+ Group Key: sub.c1
+ -> Sort
+ Sort Key: sub.c1
+ -> Subquery Scan on sub
+ -> HashAggregate
+ Group Key: test_groupagg.c1, test_groupagg.c2
+ -> Seq Scan on test_groupagg
+ Settings: max_parallel_workers = '0', max_parallel_workers_per_gather = '0'
+(9 rows)
+
+-- Only GroupAgg is selected
+set enable_hashagg to off;
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------
+ GroupAggregate
+ Group Key: test_groupagg.c1
+ -> GroupAggregate
+ Group Key: test_groupagg.c1, test_groupagg.c2
+ -> Sort
+ Sort Key: test_groupagg.c1, test_groupagg.c2
+ -> Seq Scan on test_groupagg
+ Settings: max_parallel_workers = '0', max_parallel_workers_per_gather = '0', enable_hashagg = 'off'
+(8 rows)
+
+-- Only HashAgg is selected
+SET enable_hashagg to on;
+SET enable_groupagg to off;
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: test_groupagg.c1
+ -> HashAggregate
+ Group Key: test_groupagg.c1
+ -> HashAggregate
+ Group Key: test_groupagg.c1, test_groupagg.c2
+ -> Seq Scan on test_groupagg
+ Settings: max_parallel_workers = '0', max_parallel_workers_per_gather = '0', enable_groupagg = 'off'
+(8 rows)
+
+-- Only GroupAgg is selected as a fallback
+SET enable_hashagg to off;
+SET enable_groupagg to off;
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
+ GroupAggregate
+ Group Key: test_groupagg.c1
+ -> GroupAggregate
+ Group Key: test_groupagg.c1, test_groupagg.c2
+ -> Sort
+ Sort Key: test_groupagg.c1, test_groupagg.c2
+ -> Seq Scan on test_groupagg
+ Settings: max_parallel_workers = '0', max_parallel_workers_per_gather = '0', enable_hashagg = 'off', enable_groupagg = 'off'
+(8 rows)
+
+RESET enable_hashagg;
+RESET enable_groupagg;
+RESET max_parallel_workers;
+RESET max_parallel_workers_per_gather;
+DROP TABLE test_groupagg;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..f10371d6e26 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -153,6 +153,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_distinct_reordering | on
enable_gathermerge | on
enable_group_by_reordering | on
+ enable_groupagg | on
enable_hashagg | on
enable_hashjoin | on
enable_incremental_sort | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 277b4b198cc..92493ead1f9 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1644,3 +1644,84 @@ drop table agg_hash_1;
drop table agg_hash_2;
drop table agg_hash_3;
drop table agg_hash_4;
+
+-- create table to check enable_groupagg
+CREATE TABLE test_groupagg(
+ id serial primary key,
+ c1 text,
+ c2 text,
+ c3 numeric);
+INSERT INTO test_groupagg (c1, c2, c3) VALUES
+('a', 'GGG', 100),
+('a', 'GGG', 150),
+('a', 'rrr', 200),
+('b', 'ooo', 300),
+('b', 'ooo', 250),
+('b', 'uuu', 100),
+('c', 'ppp', 500),
+('c', 'ppp', 600),
+('c', 'aaa', 550);
+ANALYZE;
+
+-- default: GroupAgg and HashAgg are mixed and selected
+SET max_parallel_workers to 0;
+SET max_parallel_workers_per_gather to 0;
+SET enable_hashagg to default;
+SET enable_groupagg to default;
+
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+
+-- Only GroupAgg is selected
+set enable_hashagg to off;
+
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+
+-- Only HashAgg is selected
+SET enable_hashagg to on;
+SET enable_groupagg to off;
+
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+
+-- Only GroupAgg is selected as a fallback
+SET enable_hashagg to off;
+SET enable_groupagg to off;
+
+EXPLAIN(costs off, settings)
+SELECT c1, AVG(total)
+FROM (
+ SELECT c1, c2, SUM(c3) AS total
+ FROM test_groupagg
+ GROUP BY c1, c2
+) AS sub
+GROUP BY c1
+ORDER BY c1;
+
+RESET enable_hashagg;
+RESET enable_groupagg;
+RESET max_parallel_workers;
+RESET max_parallel_workers_per_gather;
+DROP TABLE test_groupagg;
--
2.43.5
On Wed, 11 Jun 2025 at 20:37, Tatsuro Yamada <yamatattsu@gmail.com> wrote:
I created a regression test to check the enable_groupagg parameter in
the new patch.
To ensure plan stability, I disabled parallel query by setting the max_parallel_*
parameters to 0.Any feedback is welcome.
Typically, in the regression tests we've used enable_sort to force a
HashAgg. There are certainly times when that's not good enough and you
might also need to disabe enable_indexscan too, so I understand the
desire to add this GUC.
It's probably going to be worth going over the regression tests to
find where we use enable_sort to disable GroupAgg and replace those
with your new GUC. Otherwise, people looking at those tests in the
future will be a bit confused as to why the test didn't just SET
enable_groupagg TO false; These will likely be good enough to serve
as your test, rather than creating a new table to test this feature.
I think you should also look at create_setop_path(), as I imagine that
the same arguments for using enable_hashagg in that function apply
equally to enable_groupagg.
+ if (aggstrategy == AGG_SORTED && !enable_groupagg && enable_hashagg)
+ ++disabled_nodes;
This code looks a bit strange. You're only going to disable it if hash
agg is enabled? If they're both disabled, let add_path() decide.
Anyone who complains that they didn't get the aggregate type they
wanted with both enable_hashagg and enable_groupagg set to off hasn't
got a leg to stand on.
David
Hi David,
Typically, in the regression tests we've used enable_sort to force a
HashAgg. There are certainly times when that's not good enough and you
might also need to disabe enable_indexscan too, so I understand the
desire to add this GUC.
Thank you for the explanation.
I wasn't aware that enable_sort could be used to switch from GroupAgg
to HashAgg.
From a user's perspective, I think many would expect that if
enable_hashagg exists, then enable_groupagg would as well. Adding
such a GUC parameter seems intuitive and reasonable.So, I believe
there's value in introducing the parameter I proposed.
On the other hand, if this technique (using enable_sort) is already
widely known and commonly used, and I'm simply unfamiliar with it,
then perhaps adding this GUC might not be worth the effort.
If several people support the idea of adding this GUC parameter,
I'm thinking of creating and submitting a patch that incorporates your
comment below. I believe both David and Ashutosh support the proposal.
Can I go ahead as planned?
What do others involved in planner and plan-tuning think?
It's probably going to be worth going over the regression tests to
find where we use enable_sort to disable GroupAgg and replace those
with your new GUC. Otherwise, people looking at those tests in the
future will be a bit confused as to why the test didn't just SET
enable_groupagg TO false; These will likely be good enough to serve
as your test, rather than creating a new table to test this feature.
I agree.
Replacing existing enable_sort usages in regression tests with the new
GUC parameter would help future developers understand the intent more
clearly.
Also, I realized we can sufficiently test the feature without creating
a new table.
I think you should also look at create_setop_path(), as I imagine that
the same arguments for using enable_hashagg in that function apply
equally to enable_groupagg.
I looked into create_setop_path(), and I see that the aggregation
method is currently controlled as follows:
=====
/*
* Mark the path as disabled if enable_hashagg is off. While this
* isn't exactly a HashAgg node, it seems close enough to justify
* letting that switch control it.
*/
if (!enable_hashagg)
pathnode->path.disabled_nodes++;
=====
I plan to add enable_groupagg handling in a similar way.
+ if (aggstrategy == AGG_SORTED && !enable_groupagg && enable_hashagg) + ++disabled_nodes;This code looks a bit strange. You're only going to disable it if hash
agg is enabled? If they're both disabled, let add_path() decide.
Anyone who complains that they didn't get the aggregate type they
wanted with both enable_hashagg and enable_groupagg set to off hasn't
got a leg to stand on.
My intention was to make groupagg the fallback when both are disabled.
However, I understand that if the both are disabled, it's acceptable to
let the planner decide which strategy to use. I'll revise that part
accordingly.
Thanks,
Tatsuro Yamada