GROUP BY ALL

Started by Andrey Borodinabout 3 years ago8 messages

amborodin86@gmail.com

about 3 years ago

1 attachment(s)

Hi hackers!

I saw a thread in a social network[0]https://www.linkedin.com/posts/mosha_duckdb-firebolt-snowflake-activity-7009615821006131200-VQ0o/ about GROUP BY ALL. The idea seems useful.
I always was writing something like
select datname, usename, count(*) from pg_stat_activity group by 1,2;
and then rewriting to
select datname, usename, query, count(*) from pg_stat_activity group by 1,2;
and then "aaahhhh, add a number at the end".

With the proposed feature I can write just
select datname, usename, count(*) from pg_stat_activity group by all;

PFA very dummy implementation just for a discussion. I think we can
add all non-aggregating targets.

What do you think?

Best regards, Andrey Borodin.

[0]: https://www.linkedin.com/posts/mosha_duckdb-firebolt-snowflake-activity-7009615821006131200-VQ0o/

Attachments:

v1-0001-Implement-GROUP-BY-ALL.patchapplication/octet-stream; name=v1-0001-Implement-GROUP-BY-ALL.patchDownload

From e5f9ca89d577926155cc94e0ea5b5bbfefbd331d Mon Sep 17 00:00:00 2001
From: Andrey Borodin <xformmm@amazon.com>
Date: Sun, 18 Dec 2022 19:52:48 -0800
Subject: [PATCH v1] Implement GROUP BY ALL

---
 src/backend/parser/analyze.c         |  1 +
 src/backend/parser/gram.y            | 14 ++++++++++++++
 src/backend/parser/parse_agg.c       | 23 ++++++++++++++++++++++-
 src/backend/utils/adt/ruleutils.c    |  3 +++
 src/backend/utils/misc/queryjumble.c |  1 +
 src/include/nodes/parsenodes.h       |  2 ++
 6 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 6688c2a865..71d12ead79 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1347,6 +1347,7 @@ transformSelectStmt(ParseState *pstate, SelectStmt *stmt)
 											EXPR_KIND_GROUP_BY,
 											false /* allow SQL92 rules */ );
 	qry->groupDistinct = stmt->groupDistinct;
+	qry->groupAll = stmt->groupAll;
 
 	if (stmt->distinctClause == NIL)
 	{
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b1ae5f834c..84f8a4146a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -135,6 +135,7 @@ typedef struct SelectLimit
 typedef struct GroupClause
 {
 	bool		distinct;
+	bool		all;
 	List	   *list;
 } GroupClause;
 
@@ -12580,6 +12581,7 @@ simple_select:
 					n->whereClause = $6;
 					n->groupClause = ($7)->list;
 					n->groupDistinct = ($7)->distinct;
+					n->groupAll = ($7)->all;
 					n->havingClause = $8;
 					n->windowClause = $9;
 					$$ = (Node *) n;
@@ -12597,6 +12599,7 @@ simple_select:
 					n->whereClause = $6;
 					n->groupClause = ($7)->list;
 					n->groupDistinct = ($7)->distinct;
+					n->groupAll = ($7)->all;
 					n->havingClause = $8;
 					n->windowClause = $9;
 					$$ = (Node *) n;
@@ -13074,14 +13077,25 @@ group_clause:
 					GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause));
 
 					n->distinct = $3 == SET_QUANTIFIER_DISTINCT;
+					n->all = false;
 					n->list = $4;
 					$$ = n;
 				}
+			| GROUP_P BY ALL
+				{
+					GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause));
+
+					n->all = true;
+					n->distinct = false;
+					n->list = NIL;
+					$$ = n;
+				}
 			| /*EMPTY*/
 				{
 					GroupClause *n = (GroupClause *) palloc(sizeof(GroupClause));
 
 					n->distinct = false;
+					n->all = false;
 					n->list = NIL;
 					$$ = n;
 				}
diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c
index 3ef9e8ee5e..8826829dbc 100644
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -20,6 +20,7 @@
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
+#include "parser/analyze.h"
 #include "parser/parse_agg.h"
 #include "parser/parse_clause.h"
 #include "parser/parse_coerce.h"
@@ -1072,7 +1073,27 @@ parseCheckAggregates(ParseState *pstate, Query *qry)
 	Node	   *clause;
 
 	/* This should only be called if we found aggregates or grouping */
-	Assert(pstate->p_hasAggs || qry->groupClause || qry->havingQual || qry->groupingSets);
+	Assert(pstate->p_hasAggs || qry->groupClause || qry->havingQual || qry->groupingSets || qry->groupAll);
+
+	Assert((!qry->groupAll) || (qry->groupClause == NULL));
+
+	if (qry->groupAll)
+	{
+		Index idx = 1;
+		Index sge_idx = 1;
+		foreach(l, qry->targetList)
+		{
+			TargetEntry *tle = lfirst(l);
+			if (IsA(tle->expr, Var))
+			{
+				Oid	restype = exprType((Node *) tle->expr);
+				SortGroupClause *sgc = makeSortGroupClauseForSetOp(restype, false);
+				sgc->tleSortGroupRef = sge_idx++;
+				qry->groupClause = lappend(qry->groupClause, sgc);
+				tle->ressortgroupref = idx++;
+			}
+		}
+	}
 
 	/*
 	 * If we have grouping sets, expand them and find the intersection of all
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index a20a1b069b..d0d4711c53 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -5959,6 +5959,9 @@ get_basic_select_query(Query *query, deparse_context *context,
 		if (query->groupDistinct)
 			appendStringInfoString(buf, "DISTINCT ");
 
+		if (query->groupAll)
+			appendStringInfoString(buf, "ALL ");
+
 		save_exprkind = context->special_exprkind;
 		context->special_exprkind = EXPR_KIND_GROUP_BY;
 
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index 0ace74de78..feac9aa8b2 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -254,6 +254,7 @@ JumbleQueryInternal(JumbleState *jstate, Query *query)
 	JumbleExpr(jstate, (Node *) query->returningList);
 	JumbleExpr(jstate, (Node *) query->groupClause);
 	APP_JUMB(query->groupDistinct);
+	APP_JUMB(query->groupAll);
 	JumbleExpr(jstate, (Node *) query->groupingSets);
 	JumbleExpr(jstate, query->havingQual);
 	JumbleExpr(jstate, (Node *) query->windowClause);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 6112cd85c8..5a0b1a43cf 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -170,6 +170,7 @@ typedef struct Query
 
 	List	   *groupClause;	/* a list of SortGroupClause's */
 	bool		groupDistinct;	/* is the group by clause distinct? */
+	bool		groupAll;
 
 	List	   *groupingSets;	/* a list of GroupingSet's if present */
 
@@ -1737,6 +1738,7 @@ typedef struct SelectStmt
 	Node	   *whereClause;	/* WHERE qualification */
 	List	   *groupClause;	/* GROUP BY clauses */
 	bool		groupDistinct;	/* Is this GROUP BY DISTINCT? */
+	bool		groupAll;
 	Node	   *havingClause;	/* HAVING conditional-expression */
 	List	   *windowClause;	/* WINDOW window_name AS (...), ... */
 
-- 
2.37.0 (Apple Git-136)

Tom Lane

tgl@sss.pgh.pa.us

about 3 years ago

In reply to: Andrey Borodin (#1)

Re: GROUP BY ALL

Andrey Borodin <amborodin86@gmail.com> writes:

I saw a thread in a social network[0] about GROUP BY ALL. The idea seems useful.

Isn't that just a nonstandard spelling of SELECT DISTINCT?

What would happen if there are aggregate functions in the tlist?
I'm not especially on board with "ALL" meaning "ALL (oh, but not
aggregates)".

regards, tom lane

Andrey Borodin

amborodin86@gmail.com

about 3 years ago

In reply to: Tom Lane (#2)

Re: GROUP BY ALL

On Sun, Dec 18, 2022 at 8:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not especially on board with "ALL" meaning "ALL (oh, but not
aggregates)".

Yes, that's the weak part of the proposal. I even thought about
renaming it to "GROUP BY SOMEHOW" or even "GROUP BY SURPRISE ME".
I mean I see some cases when it's useful and much less cases when it's
dangerously ambiguous. E.g. grouping by result of a subquery looks way
too complex and unpredictable. But with simple Vars... what could go
wrong?

Best regards, Andrey Borodin.

David G. Johnston

david.g.johnston@gmail.com

about 3 years ago

In reply to: Tom Lane (#2)

Re: GROUP BY ALL

On Sunday, December 18, 2022, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrey Borodin <amborodin86@gmail.com> writes:

I saw a thread in a social network[0] about GROUP BY ALL. The idea seems

useful.

Isn't that just a nonstandard spelling of SELECT DISTINCT?

What would happen if there are aggregate functions in the tlist?
I'm not especially on board with "ALL" meaning "ALL (oh, but not
aggregates)".

IIUC some systems treat any non-aggregated column as an implicit group by
column. This proposal is an explicit way to enable that implicit behavior
in PostgreSQL. It is, as you note, an odd meaning for the word ALL.

We tend to not accept non-standard usability syntax extensions even if
others systems implement them. I don’t see this one ending up being an
exception…

David J.

Isaac Morland

isaac.morland@gmail.com

about 3 years ago

In reply to: Tom Lane (#2)

Re: GROUP BY ALL

On Sun, 18 Dec 2022 at 23:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrey Borodin <amborodin86@gmail.com> writes:

I saw a thread in a social network[0] about GROUP BY ALL. The idea seems

useful.

Isn't that just a nonstandard spelling of SELECT DISTINCT?

In a pure relational system, yes; but since Postgres allows duplicate rows,
both in actual table data and in intermediate and final result sets, no.
Although I'm pretty sure no aggregates other than count() are useful - any
other aggregate would always just combine count() copies of the duplicated
value in some way.

What would happen if there are aggregate functions in the tlist?

I'm not especially on board with "ALL" meaning "ALL (oh, but not
aggregates)".

The requested behaviour can be accomplished by an invocation something like:

select (t).*, count(*) from (select (…field1, field2, …) as t from
…tables…) s group by t;

So we collect all the required fields as a tuple, group by the tuple, and
then unpack it into separate columns in the outer query.

Vik Fearing

vik@postgresfriends.org

about 3 years ago

In reply to: Andrey Borodin (#1)

Re: GROUP BY ALL

On 12/19/22 05:19, Andrey Borodin wrote:

Hi hackers!

I saw a thread in a social network[0] about GROUP BY ALL. The idea seems useful.
I always was writing something like
select datname, usename, count(*) from pg_stat_activity group by 1,2;
and then rewriting to
select datname, usename, query, count(*) from pg_stat_activity group by 1,2;
and then "aaahhhh, add a number at the end".

With the proposed feature I can write just
select datname, usename, count(*) from pg_stat_activity group by all;

We already have GROUP BY ALL, but it doesn't do this.

PFA very dummy implementation just for a discussion. I think we can
add all non-aggregating targets.

What do you think?

I think this is a pretty terrible idea. If we want that kind of
behavior, we should just allow the GROUP BY to be omitted since without
grouping sets, it is kind of redundant anyway.

I don't know what my opinion is on that.
--
Vik Fearing

Bruce Momjian

bruce@momjian.us

about 3 years ago

In reply to: Vik Fearing (#6)

Re: GROUP BY ALL

On Mon, Dec 19, 2022 at 05:53:46PM +0100, Vik Fearing wrote:

I think this is a pretty terrible idea. If we want that kind of behavior,
we should just allow the GROUP BY to be omitted since without grouping sets,
it is kind of redundant anyway.

I don't know what my opinion is on that.

This is a very interesting concept. Because Postgres requires GROUP BY
of all non-aggregate columns of a target list, Postgres could certainly
automatically generate the GROUP BY. However, readers of the query
might not easily distinguish function calls from aggregates, so in a way
the GROUP BY is for the reader, not for the database server.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Embrace your flaws. They make you human, rather than perfect,
which you will never be.

Andrey Borodin

amborodin86@gmail.com

about 3 years ago

In reply to: Bruce Momjian (#7)

Re: GROUP BY ALL

On Fri, Jan 6, 2023 at 1:56 PM Bruce Momjian <bruce@momjian.us> wrote:

Because Postgres requires GROUP BY
of all non-aggregate columns of a target list, Postgres could certainly
automatically generate the GROUP BY. However, readers of the query
might not easily distinguish function calls from aggregates, so in a way
the GROUP BY is for the reader, not for the database server.

How about "SELECT a,b, count(*) FROM t GROUP AUTOMATICALLY;" ? And
then a shorthand for "SELECT a,b, count(*) FROM t GROUP;".

Anyway, the problem is not in clever syntax, but in the fact that it's
an SQL extension, not a standard...

Best regards, Andrey Borodin.