Issue with query_is_distinct_for() and grouping sets

Started by Richard Guo3 months ago8 messages

guofenglinux@gmail.com

3 months ago

1 attachment(s)

While working on the fix for the push-down-HAVING optimization, I
noticed $subject.

query_is_distinct_for() is intended to determine whether a query never
returns duplicates of the specified columns. For a query using
grouping sets, if there are no grouping expressions, we know there are
no non-empty grouping sets and potentially one or more empty grouping
sets. The idea is to check whether there is only one empty grouping
set, in which case the query would return only one row and thus
unique. Currently the check being performed is:

if (list_length(query->groupingSets) == 1 &&
((GroupingSet *) linitial(query->groupingSets))->kind ==
GROUPING_SET_EMPTY)
return true;
else
return false;

It seems to me that this check is not thorough enough. First, if the
DISTINCT clause is used on the GROUP BY, duplicate empty grouping sets
will be removed, leaving only one. However, the current code does not
check groupDistinct at all.

Second, if the GROUP BY clause is written in the form of "grouping
sets(())", the GroupingSet is actually a GROUPING_SET_SETS node
containing a single GROUPING_SET_EMPTY node, which still represents
only one empty grouping set. The current check does not account for
this structure.

Third, the check "list_length(query->groupingSets) == 1" is also
insufficient. It's possible to have multiple GroupingSet nodes and
still end up with a single empty grouping set, as long as each node
contains only one GROUPING_SET_EMPTY element, for example, in a query
like "group by grouping sets (()), grouping sets (())".

To fix this issue, I suggest first checking the groupDistinct flag,
which is straightforward and does not cost much. After that, there
seem to be two possible approaches to verify whether there is only one
empty grouping set.

1) In addition to the current check, we can iterate through
query->groupingSets and verify that each GroupingSet node contains
exactly one element. This should cover all possible structures that
represent a single empty grouping set.

2) Alternatively, we can perform expand_grouping_sets() and then check
whether the resulting expanded flat list contains only one element.

I'm somewhat inclined toward option #2, as it seems to be a more
principled approach. It could be argued that calling
expand_grouping_sets() can introduce planning overhead, but I don't
think this is a problem, because 1) we're dealing with a simple
grouping sets structure that contains only empty grouping sets, and 2)
since we have already checked groupDistinct, there's no need to
perform deduplication of grouping sets within expand_grouping_sets().
Therefore, I expect the overhead from calling expand_grouping_sets()
to be minimal.

Attached is a patch along the lines of option #2. The LCOV report
indicates that there is currently no test coverage for the "else if
(query->groupingSets)" branch in query_is_distinct_for(). This patch
also adds test cases to cover that branch.

Thoughts?

- Richard

Attachments:

v1-0001-Fix-distinctness-check-for-queries-with-grouping-.patchapplication/octet-stream; name=v1-0001-Fix-distinctness-check-for-queries-with-grouping-.patchDownload

From 7cc966721a74047108aa53e4f166c3457b20db85 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 22 Oct 2025 12:25:08 +0900
Subject: [PATCH v1] Fix distinctness check for queries with grouping sets

---
 src/backend/optimizer/plan/analyzejoins.c | 19 +++---
 src/test/regress/expected/join.out        | 75 +++++++++++++++++++++++
 src/test/regress/sql/join.sql             | 28 +++++++++
 3 files changed, 115 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6a3c030e8ef..de7cb59e5c2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -31,6 +31,7 @@
 #include "optimizer/placeholder.h"
 #include "optimizer/planmain.h"
 #include "optimizer/restrictinfo.h"
+#include "parser/parse_agg.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
 
@@ -1175,6 +1176,8 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
 	}
 	else if (query->groupingSets)
 	{
+		List	   *gsets;
+
 		/*
 		 * If we have grouping sets with expressions, we probably don't have
 		 * uniqueness and analysis would be hard. Punt.
@@ -1184,15 +1187,17 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
 
 		/*
 		 * If we have no groupClause (therefore no grouping expressions), we
-		 * might have one or many empty grouping sets. If there's just one,
-		 * then we're returning only one row and are certainly unique. But
-		 * otherwise, we know we're certainly not unique.
+		 * might have one or many empty grouping sets. If there's just one, or
+		 * if the DISTINCT clause is used on the GROUP BY, then we're
+		 * returning only one row and are certainly unique. But otherwise, we
+		 * know we're certainly not unique.
 		 */
-		if (list_length(query->groupingSets) == 1 &&
-			((GroupingSet *) linitial(query->groupingSets))->kind == GROUPING_SET_EMPTY)
+		if (query->groupDistinct)
 			return true;
-		else
-			return false;
+
+		gsets = expand_grouping_sets(query->groupingSets, false, -1);
+
+		return (list_length(gsets) == 1);
 	}
 	else
 	{
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index d10095de70f..80a6148810c 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6134,6 +6134,32 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
  Seq Scan on d
 (1 row)
 
+-- check that join removal works for a left join when joining a subquery
+-- that is guaranteed to be unique by GROUPING SETS
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by ()) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(()), grouping sets(())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
@@ -6162,6 +6188,55 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
          ->  Seq Scan on d
 (8 rows)
 
+-- join removal is not possible when the GROUP BY contains non-empty grouping
+-- sets or multiple empty grouping sets
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by rollup(x)) s
+  on d.a = s.x;
+           QUERY PLAN            
+---------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  MixedAggregate
+               Hash Key: 1
+               Group Key: ()
+               ->  Seq Scan on b
+(8 rows)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), ())) s
+  on d.a = s.x;
+               QUERY PLAN                
+-----------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Append
+               ->  Result
+                     Replaces: Aggregate
+               ->  Result
+                     Replaces: Aggregate
+(9 rows)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), grouping sets(()))) s
+  on d.a = s.x;
+               QUERY PLAN                
+-----------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Append
+               ->  Result
+                     Replaces: Aggregate
+               ->  Result
+                     Replaces: Aggregate
+(9 rows)
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index b1732453e8d..3de8dff2729 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2212,6 +2212,20 @@ explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id and d.b = s.c_id;
 
+-- check that join removal works for a left join when joining a subquery
+-- that is guaranteed to be unique by GROUPING SETS
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by ()) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(()), grouping sets(())) s
+  on d.a = s.x;
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
@@ -2225,6 +2239,20 @@ explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
 
+-- join removal is not possible when the GROUP BY contains non-empty grouping
+-- sets or multiple empty grouping sets
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by rollup(x)) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), ())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), grouping sets(()))) s
+  on d.a = s.x;
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
-- 
2.39.5 (Apple Git-154)

Richard Guo

guofenglinux@gmail.com

3 months ago

In reply to: Richard Guo (#1)

1 attachment(s)

Re: Issue with query_is_distinct_for() and grouping sets

On Wed, Oct 22, 2025 at 6:25 PM Richard Guo <guofenglinux@gmail.com> wrote:

Attached is a patch along the lines of option #2. The LCOV report
indicates that there is currently no test coverage for the "else if
(query->groupingSets)" branch in query_is_distinct_for(). This patch
also adds test cases to cover that branch.

Here is an updated patch that includes a commit message and adds a new
test case involving DISTINCT clause used with GROUP BY.

- Richard

Attachments:

v2-0001-Fix-distinctness-check-for-queries-with-grouping-.patchapplication/octet-stream; name=v2-0001-Fix-distinctness-check-for-queries-with-grouping-.patchDownload

From 2db5d944c7f353c4d1e618ff27ca7eaeb1f9244f Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 22 Oct 2025 12:25:08 +0900
Subject: [PATCH v2] Fix distinctness check for queries with grouping sets

query_is_distinct_for() is intended to determine whether a query never
returns duplicates of the specified columns.  For queries using
grouping sets, if there are no grouping expressions, the query may
contain one or more empty grouping sets.  The goal is to detect
whether there is exactly one empty grouping set, in which case the
query would return a single row and thus be distinct.

The previous logic in query_is_distinct_for() was incomplete.  It
failed to consider cases where the DISTINCT clause is used on the
GROUP BY, in which case duplicate empty grouping sets are removed,
leaving only one.  It also did not correctly handle all possible
structures of GroupingSet nodes that represent a single empty grouping
set.

To fix, add a check for the groupDistinct flag, and expand the query's
groupingSets tree into a flat list, then verify that the expanded list
contains only one element.

No backpatch as this could result in plan changes.

Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs480Z04NtP8-O55uROq2Zego309+h3hhaZhz6ztmgWLEBw@mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 19 ++++--
 src/test/regress/expected/join.out        | 83 +++++++++++++++++++++++
 src/test/regress/sql/join.sql             | 32 +++++++++
 3 files changed, 127 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6a3c030e8ef..de7cb59e5c2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -31,6 +31,7 @@
 #include "optimizer/placeholder.h"
 #include "optimizer/planmain.h"
 #include "optimizer/restrictinfo.h"
+#include "parser/parse_agg.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
 
@@ -1175,6 +1176,8 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
 	}
 	else if (query->groupingSets)
 	{
+		List	   *gsets;
+
 		/*
 		 * If we have grouping sets with expressions, we probably don't have
 		 * uniqueness and analysis would be hard. Punt.
@@ -1184,15 +1187,17 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
 
 		/*
 		 * If we have no groupClause (therefore no grouping expressions), we
-		 * might have one or many empty grouping sets. If there's just one,
-		 * then we're returning only one row and are certainly unique. But
-		 * otherwise, we know we're certainly not unique.
+		 * might have one or many empty grouping sets. If there's just one, or
+		 * if the DISTINCT clause is used on the GROUP BY, then we're
+		 * returning only one row and are certainly unique. But otherwise, we
+		 * know we're certainly not unique.
 		 */
-		if (list_length(query->groupingSets) == 1 &&
-			((GroupingSet *) linitial(query->groupingSets))->kind == GROUPING_SET_EMPTY)
+		if (query->groupDistinct)
 			return true;
-		else
-			return false;
+
+		gsets = expand_grouping_sets(query->groupingSets, false, -1);
+
+		return (list_length(gsets) == 1);
 	}
 	else
 	{
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index d10095de70f..fafe132af3d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6134,6 +6134,40 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
  Seq Scan on d
 (1 row)
 
+-- check that join removal works for a left join when joining a subquery
+-- that is guaranteed to be unique by GROUPING SETS
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by ()) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(()), grouping sets(())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by distinct grouping sets((), ())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
@@ -6162,6 +6196,55 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
          ->  Seq Scan on d
 (8 rows)
 
+-- join removal is not possible when the GROUP BY contains non-empty grouping
+-- sets or multiple empty grouping sets
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by rollup(x)) s
+  on d.a = s.x;
+           QUERY PLAN            
+---------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  MixedAggregate
+               Hash Key: 1
+               Group Key: ()
+               ->  Seq Scan on b
+(8 rows)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), ())) s
+  on d.a = s.x;
+               QUERY PLAN                
+-----------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Append
+               ->  Result
+                     Replaces: Aggregate
+               ->  Result
+                     Replaces: Aggregate
+(9 rows)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), grouping sets(()))) s
+  on d.a = s.x;
+               QUERY PLAN                
+-----------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Append
+               ->  Result
+                     Replaces: Aggregate
+               ->  Result
+                     Replaces: Aggregate
+(9 rows)
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index b1732453e8d..40d901a5307 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2212,6 +2212,24 @@ explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id and d.b = s.c_id;
 
+-- check that join removal works for a left join when joining a subquery
+-- that is guaranteed to be unique by GROUPING SETS
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by ()) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(()), grouping sets(())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by distinct grouping sets((), ())) s
+  on d.a = s.x;
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
@@ -2225,6 +2243,20 @@ explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
 
+-- join removal is not possible when the GROUP BY contains non-empty grouping
+-- sets or multiple empty grouping sets
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by rollup(x)) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), ())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), grouping sets(()))) s
+  on d.a = s.x;
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
-- 
2.39.5 (Apple Git-154)

David Rowley

dgrowleyml@gmail.com

3 months ago

In reply to: Richard Guo (#2)

Re: Issue with query_is_distinct_for() and grouping sets

On Thu, 23 Oct 2025 at 15:45, Richard Guo <guofenglinux@gmail.com> wrote:

On Wed, Oct 22, 2025 at 6:25 PM Richard Guo <guofenglinux@gmail.com> wrote:

Attached is a patch along the lines of option #2. The LCOV report
indicates that there is currently no test coverage for the "else if
(query->groupingSets)" branch in query_is_distinct_for(). This patch
also adds test cases to cover that branch.

Here is an updated patch that includes a commit message and adds a new
test case involving DISTINCT clause used with GROUP BY.

I've not reviewed the patch, so I'm assuming this is broken and your
fix is correct, but I did see your commit message says:

No backpatch as this could result in plan changes.

If this is broken then it'll need to be backpatched as if that
function returns true when it should return false, then you could have
LEFT JOINs being removed when they shouldn't or joins being marked as
"Inner Unique" when they shouldn't, which could result in incorrect
query results.

David

David Rowley

dgrowleyml@gmail.com

3 months ago

In reply to: David Rowley (#3)

Re: Issue with query_is_distinct_for() and grouping sets

On Thu, 23 Oct 2025 at 15:59, David Rowley <dgrowleyml@gmail.com> wrote:

No backpatch as this could result in plan changes.

If this is broken then it'll need to be backpatched as if that
function returns true when it should return false, then you could have
LEFT JOINs being removed when they shouldn't or joins being marked as
"Inner Unique" when they shouldn't, which could result in incorrect
query results.

Or if it's a case of it returning false when it could have returned
true, then maybe the commit message should make that clear. I'm unable
to tell from reading it. Something like; "The previous logic in
query_is_distinct_for() was incomplete [as it failed to detect that a
query was distinct when ...]".

David

Richard Guo

guofenglinux@gmail.com

3 months ago

In reply to: David Rowley (#4)

Re: Issue with query_is_distinct_for() and grouping sets

On Thu, Oct 23, 2025 at 12:07 PM David Rowley <dgrowleyml@gmail.com> wrote:

Or if it's a case of it returning false when it could have returned
true, then maybe the commit message should make that clear. I'm unable
to tell from reading it. Something like; "The previous logic in
query_is_distinct_for() was incomplete [as it failed to detect that a
query was distinct when ...]".

It's the case of failing to recognize distinctness when it actually
holds (i.e., a false negative). Therefore, this issue does not cause
incorrect results, but rather leads to missed optimization
opportunities.

How about using the following wording in the commit message?

"
The previous logic in query_is_distinct_for() was incomplete because
the check was insufficiently thorough and could return false when it
should have returned true.
"

- Richard

David Rowley

dgrowleyml@gmail.com

3 months ago

In reply to: Richard Guo (#5)

Re: Issue with query_is_distinct_for() and grouping sets

On Thu, 23 Oct 2025 at 16:43, Richard Guo <guofenglinux@gmail.com> wrote:

How about using the following wording in the commit message?

"
The previous logic in query_is_distinct_for() was incomplete because
the check was insufficiently thorough and could return false when it
should have returned true.
"

"it could" might be more accurate. The function is under no obligation
to return true for every possible case.

David

Richard Guo

guofenglinux@gmail.com

3 months ago

In reply to: David Rowley (#6)

1 attachment(s)

Re: Issue with query_is_distinct_for() and grouping sets

On Thu, Oct 23, 2025 at 1:33 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 23 Oct 2025 at 16:43, Richard Guo <guofenglinux@gmail.com> wrote:

How about using the following wording in the commit message?

"
The previous logic in query_is_distinct_for() was incomplete because
the check was insufficiently thorough and could return false when it
should have returned true.
"

"it could" might be more accurate. The function is under no obligation
to return true for every possible case.

Fair point. Patch updated with a revised commit message.

- Richard

Attachments:

v3-0001-Fix-distinctness-check-for-queries-with-grouping-.patchapplication/octet-stream; name=v3-0001-Fix-distinctness-check-for-queries-with-grouping-.patchDownload

From 081782d04bf49eb7dbccdc42666932b3b9aa1fb7 Mon Sep 17 00:00:00 2001
From: Richard Guo <guofenglinux@gmail.com>
Date: Wed, 22 Oct 2025 12:25:08 +0900
Subject: [PATCH v3] Fix distinctness check for queries with grouping sets

query_is_distinct_for() is intended to determine whether a query never
returns duplicates of the specified columns.  For queries using
grouping sets, if there are no grouping expressions, the query may
contain one or more empty grouping sets.  The goal is to detect
whether there is exactly one empty grouping set, in which case the
query would return a single row and thus be distinct.

The previous logic in query_is_distinct_for() was incomplete because
the check was insufficiently thorough and could return false when it
could have returned true.  It failed to consider cases where the
DISTINCT clause is used on the GROUP BY, in which case duplicate empty
grouping sets are removed, leaving only one.  It also did not
correctly handle all possible structures of GroupingSet nodes that
represent a single empty grouping set.

To fix, add a check for the groupDistinct flag, and expand the query's
groupingSets tree into a flat list, then verify that the expanded list
contains only one element.

No backpatch as this could result in plan changes.

Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/CAMbWs480Z04NtP8-O55uROq2Zego309+h3hhaZhz6ztmgWLEBw@mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 19 ++++--
 src/test/regress/expected/join.out        | 83 +++++++++++++++++++++++
 src/test/regress/sql/join.sql             | 32 +++++++++
 3 files changed, 127 insertions(+), 7 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 6a3c030e8ef..de7cb59e5c2 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -31,6 +31,7 @@
 #include "optimizer/placeholder.h"
 #include "optimizer/planmain.h"
 #include "optimizer/restrictinfo.h"
+#include "parser/parse_agg.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
 
@@ -1175,6 +1176,8 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
 	}
 	else if (query->groupingSets)
 	{
+		List	   *gsets;
+
 		/*
 		 * If we have grouping sets with expressions, we probably don't have
 		 * uniqueness and analysis would be hard. Punt.
@@ -1184,15 +1187,17 @@ query_is_distinct_for(Query *query, List *colnos, List *opids)
 
 		/*
 		 * If we have no groupClause (therefore no grouping expressions), we
-		 * might have one or many empty grouping sets. If there's just one,
-		 * then we're returning only one row and are certainly unique. But
-		 * otherwise, we know we're certainly not unique.
+		 * might have one or many empty grouping sets. If there's just one, or
+		 * if the DISTINCT clause is used on the GROUP BY, then we're
+		 * returning only one row and are certainly unique. But otherwise, we
+		 * know we're certainly not unique.
 		 */
-		if (list_length(query->groupingSets) == 1 &&
-			((GroupingSet *) linitial(query->groupingSets))->kind == GROUPING_SET_EMPTY)
+		if (query->groupDistinct)
 			return true;
-		else
-			return false;
+
+		gsets = expand_grouping_sets(query->groupingSets, false, -1);
+
+		return (list_length(gsets) == 1);
 	}
 	else
 	{
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index d10095de70f..fafe132af3d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6134,6 +6134,40 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
  Seq Scan on d
 (1 row)
 
+-- check that join removal works for a left join when joining a subquery
+-- that is guaranteed to be unique by GROUPING SETS
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by ()) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(()), grouping sets(())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by distinct grouping sets((), ())) s
+  on d.a = s.x;
+  QUERY PLAN   
+---------------
+ Seq Scan on d
+(1 row)
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
@@ -6162,6 +6196,55 @@ select d.* from d left join (select * from b group by b.id, b.c_id) s
          ->  Seq Scan on d
 (8 rows)
 
+-- join removal is not possible when the GROUP BY contains non-empty grouping
+-- sets or multiple empty grouping sets
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by rollup(x)) s
+  on d.a = s.x;
+           QUERY PLAN            
+---------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  MixedAggregate
+               Hash Key: 1
+               Group Key: ()
+               ->  Seq Scan on b
+(8 rows)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), ())) s
+  on d.a = s.x;
+               QUERY PLAN                
+-----------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Append
+               ->  Result
+                     Replaces: Aggregate
+               ->  Result
+                     Replaces: Aggregate
+(9 rows)
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), grouping sets(()))) s
+  on d.a = s.x;
+               QUERY PLAN                
+-----------------------------------------
+ Hash Left Join
+   Hash Cond: (d.a = (1))
+   ->  Seq Scan on d
+   ->  Hash
+         ->  Append
+               ->  Result
+                     Replaces: Aggregate
+               ->  Result
+                     Replaces: Aggregate
+(9 rows)
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index b1732453e8d..40d901a5307 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2212,6 +2212,24 @@ explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id and d.b = s.c_id;
 
+-- check that join removal works for a left join when joining a subquery
+-- that is guaranteed to be unique by GROUPING SETS
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by ()) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets(()), grouping sets(())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by distinct grouping sets((), ())) s
+  on d.a = s.x;
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
@@ -2225,6 +2243,20 @@ explain (costs off)
 select d.* from d left join (select * from b group by b.id, b.c_id) s
   on d.a = s.id;
 
+-- join removal is not possible when the GROUP BY contains non-empty grouping
+-- sets or multiple empty grouping sets
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by rollup(x)) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), ())) s
+  on d.a = s.x;
+
+explain (costs off)
+select d.* from d left join (select 1 as x from b group by grouping sets((), grouping sets(()))) s
+  on d.a = s.x;
+
 -- similarly, but keying off a DISTINCT clause
 explain (costs off)
 select d.* from d left join (select distinct * from b) s
-- 
2.39.5 (Apple Git-154)

Richard Guo

guofenglinux@gmail.com

about 1 month ago

In reply to: Richard Guo (#7)

Re: Issue with query_is_distinct_for() and grouping sets

On Thu, Oct 23, 2025 at 4:07 PM Richard Guo <guofenglinux@gmail.com> wrote:

Fair point. Patch updated with a revised commit message.

... and then pushed.

- Richard