[PATCH] GROUP BY ALL
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.
The grammar is unambiguous, so can support this construct in lieu of
the traditional GROUP BY clause. Enclosed is a patch which adds this
via just scanning the TargetEntry list and adding anything that is not
an aggregate function call to the groupList.
Still need some docs; just throwing this out there and getting some feedback.
Thanks,
David
Attachments:
0001-WIP-groupAll-to-automatically-add-columns-to-the-gro.patchapplication/octet-stream; name=0001-WIP-groupAll-to-automatically-add-columns-to-the-gro.patchDownload+152-35
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.Still need some docs; just throwing this out there and getting some
feedback.
I strongly dislike adding this feature. I'd only consider supporting it if
it was part of the SQL standard.
Code is written once and read many times. This feature caters to
the writer, not the reader. And furthermore usage of this is prone to be
to the writer's detriment as well.
David J.
On Mon, 22 Jul 2024 at 17:34, David G. Johnston <david.g.johnston@gmail.com>
wrote:
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net>
wrote:I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.Still need some docs; just throwing this out there and getting some
feedback.I strongly dislike adding this feature. I'd only consider supporting it
if it was part of the SQL standard.Code is written once and read many times. This feature caters to
the writer, not the reader. And furthermore usage of this is prone to be
to the writer's detriment as well.
And for when this might be useful, the syntax for it already exists,
although a spurious error message is generated:
odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be
used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
^
I'm not sure exactly what's going on here — it's like it's still seeing the
table name in the field list as only a table name and not the value
corresponding to the whole table as a row value (But in general I'm not
happy with the system's ability to figure out that a column's value has
only one possibility given the grouping columns). You can work around:
odyssey=> with t as (select uw_term, count(*) from uw_term group by
uw_term) select (uw_term).*, count from t;
This query works.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.
I strongly dislike adding this feature. I'd only consider supporting it if
it was part of the SQL standard.
Yeah ... my recollection is that we already rejected this idea.
If you want to re-litigate that, "throwing this out there" is
not a sufficient argument.
(Personally, I'd wonder exactly what ALL is quantified over: the
whole output of the FROM clause, or only columns mentioned in the
SELECT tlist, or what? And why that choice rather than another?)
regards, tom lane
Isaac Morland <isaac.morland@gmail.com> writes:
And for when this might be useful, the syntax for it already exists,
although a spurious error message is generated:
odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be
used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
^
I'm not sure exactly what's going on here
The SELECT entry is expanded into "uw_term.col1, uw_term.col2,
uw_term.col3, ...", and those single-column Vars don't match the
whole-row Var appearing in the GROUP BY list. I guess if we
think this is important, we could add a proof rule saying that
a per-column Var is functionally dependent on a whole-row Var
of the same relation. Odd that the point hasn't come up before
(though I guess that suggests that few people try this).
regards, tom lane
<div class="52245733ac3b195ee6f02a3f4f5dbe50ApplePlainTextBody" dir="auto" style="word-wrap:break-word"><br /><br /><blockquote>On 23 Jul 2024, at 00:40, Isaac Morland <<a href="mailto:isaac.morland@gmail.com">isaac.morland@gmail.com</a>> wrote:<br /><br />odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;<br />ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function<br />LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;<br /></blockquote><br /><div>AFAIR this problem was solved in my implementation [0]</div><div><br /></div><div><blockquote>On 23 Jul 2024, at 01:29, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /><br />(Personally, I'd wonder exactly what ALL is quantified over: the<br />whole output of the FROM clause, or only columns mentioned in the<br />SELECT tlist, or what? And why that choice rather than another?)<br /></blockquote><br /></div><div>I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora box of syntax sugar extensions which may will be incompatible with future standards.</div><div>If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT column and better GROUP BY.</div><div><br /></div><div><br /></div><div>Best regards, Andrey Borodin.</div><div><br /></div><div><br /></div><div>[0] <a href="/messages/by-id/CAAhFRxjyTO5BHn9y1oOSEp0TtpTDTTTb7HJBNhTG+i3-hXC0XQ@mail.gmail.com">https://www.postgresql.org/message-id/flat/CAAhFRxjyTO5BHn9y1oOSEp0TtpTDTTTb7HJBNhTG+i3-hXC0XQ@mail.gmail.com</a></div></div>
Import Notes
Resolved by subject fallback
On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.Still need some docs; just throwing this out there and getting some feedback.
I strongly dislike adding this feature. I'd only consider supporting it if it was part of the SQL standard.
Code is written once and read many times. This feature caters to the writer, not the reader. And furthermore usage of this is prone to be to the writer's detriment as well.
I'd say this feature (at least for me) caters to the investigator;
someone who is interactively looking at data hence why it would cater
to the writer. Consider acquainting yourself with a large table that
has a large number of annoying-named fields where you want to look at
how different data is correlated or broken-down. Something along the
lines of:
SELECT last_name, substring(first_name,1,1) as first_initial,
income_range, count(*) FROM census_data GROUP BY ALL;
If you are iteratively looking at things, adding or removing fields
from your breakdown, you only need to change it in a single place, the
tlist. Additionally, expressions can be used transparently without
needing to repeat them. (Yes, in practice, I'd often use GROUP BY 1,
2, say, but if you add more fields to this you need to edit in
multiple places.)
David
On Mon, Jul 22, 2024 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.I strongly dislike adding this feature. I'd only consider supporting it if
it was part of the SQL standard.Yeah ... my recollection is that we already rejected this idea.
If you want to re-litigate that, "throwing this out there" is
not a sufficient argument.
Heh, fair enough. I actually wrote the patch after encountering the
syntax in DuckDB and it seemed easy enough to add to Postgres while
providing some utility, then ended up seeing a thread about it later.
I did not get the sense that this had been globally rejected; though
there were definitely voices against it, it seemed to trail off rather
than coming to a conclusion.
(Personally, I'd wonder exactly what ALL is quantified over: the
whole output of the FROM clause, or only columns mentioned in the
SELECT tlist, or what? And why that choice rather than another?)
My intention here was to basically be a shorthand for "group by
specified non-aggregate fields in the select list". Perhaps I'm not
being creative enough, but what is the interpretation/use case for
anything else? :-)
While there are other ways to accomplish these things, making an easy
way to GROUP BY with aggregate queries would be useful in the field,
particularly when doing iterative discovery work would save a lot of
time with a situation that is both detectable and hits users with
errors all the time.
I'm not married to the exact syntax of this feature; anything else
short and consistent could work if `ALL` is considered to potentially
gain a different interpretation in the future.
David
On Tue, Jul 23, 2024 at 8:21 AM Andrei Borodin <x4mmm@yandex-team.ru> wrote:
On 23 Jul 2024, at 00:40, Isaac Morland <isaac.morland@gmail.com> wrote:
odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;AFAIR this problem was solved in my implementation [0]
On 23 Jul 2024, at 01:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:
(Personally, I'd wonder exactly what ALL is quantified over: the
whole output of the FROM clause, or only columns mentioned in the
SELECT tlist, or what? And why that choice rather than another?)I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora box of syntax sugar extensions which may will be incompatible with future standards.
If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT column and better GROUP BY.
GROUP BY AUTO also seems fine here to me; I understand the desire to
avoid major incompatible syntax changes; GROUP BY ALL does exist in
multiple products so it's not unprecedented.
I wrote my patch before seeing your thread, sorry I missed that. :-)
David
On Mon, Jul 22, 2024 at 4:41 PM Isaac Morland <isaac.morland@gmail.com> wrote:
And for when this might be useful, the syntax for it already exists, although a spurious error message is generated:
odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
^
This is with my patch, or existing postgres? Grouping by record is
not actually what this patch is trying to do, so perhaps there is some
ambiguity; this is intended to GROUP BY any select item that is not an
aggregate function.
On Tue, 2024-07-23 at 08:37 -0500, David Christensen wrote:
My intention here was to basically be a shorthand for "group by
specified non-aggregate fields in the select list". Perhaps I'm not
being creative enough, but what is the interpretation/use case for
anything else? :-)
I am somewhat against this feature.
It is too much magic for my taste.
It might be handy for interactive use, but I would frown at an application
that uses code like that, much like I'd frown at "SELECT *" in application code.
Yours,
Laurenz Albe
On Tue, Jul 23, 2024 at 10:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2024-07-23 at 08:37 -0500, David Christensen wrote:
My intention here was to basically be a shorthand for "group by
specified non-aggregate fields in the select list". Perhaps I'm not
being creative enough, but what is the interpretation/use case for
anything else? :-)I am somewhat against this feature.
It is too much magic for my taste.It might be handy for interactive use, but I would frown at an application
that uses code like that, much like I'd frown at "SELECT *" in application code.
Sure, not everything that makes things easier is strictly necessary;
we could require `CAST(field AS text)` instead of `::text`, make
subqueries required for transforming oids into specific system tables
instead of `::regfoo` casts, any number of other choices, remove
`SELECT *` as a parse option, but making it easier to do common things
interactively as a DBA has value as well.
David
On Tue, Jul 23, 2024 at 9:48 AM David Christensen <david@pgguru.net> wrote:
Sure, not everything that makes things easier is strictly necessary;
we could require `CAST(field AS text)` instead of `::text`,
Probably should have...being standard and all. Though syntactic sugar is
quite different from new behavior - transforming :: to CAST is
straight-forward.
make
subqueries required for transforming oids into specific system tables
instead of `::regfoo` casts,
Since OID is non-standard this falls within our purview.
any number of other choices, remove
`SELECT *` as a parse option,
Again, standard dictated.
but making it easier to do common things
interactively as a DBA has value as well.
Agreed, but this isn't a clear-cut win, and doesn't have standard
conformance to tip the scale over fully.
Also, there are so many better tools for data exploration. Removing this
quirk only marginally closes that gap.
David J.
On 7/22/24 15:43, Tom Lane wrote:
Isaac Morland <isaac.morland@gmail.com> writes:
And for when this might be useful, the syntax for it already exists,
although a spurious error message is generated:odyssey=> select (uw_term).*, count(*) from uw_term group by uw_term;
ERROR: column "uw_term.term_id" must appear in the GROUP BY clause or be
used in an aggregate function
LINE 1: select (uw_term).*, count(*) from uw_term group by uw_term;
^I'm not sure exactly what's going on here
The SELECT entry is expanded into "uw_term.col1, uw_term.col2,
uw_term.col3, ...", and those single-column Vars don't match the
whole-row Var appearing in the GROUP BY list. I guess if we
think this is important, we could add a proof rule saying that
a per-column Var is functionally dependent on a whole-row Var
of the same relation. Odd that the point hasn't come up before
(though I guess that suggests that few people try this).
I was just using this group-by-row feature last week to implement a temporal outer join in a way
that would work for arbitrary tables. Here is some example SQL:
That does `GROUP BY a` then `SELECT (x.a).*`.[1]Actually I see it does `GROUP BY a, a.valid_at`, but that is surely more than I need. I think that `a.valid_at` is leftover from a previous version of the query.
It is very useful for writing queries that don't want to know about the structure of the row.
I noticed the same error as Isaac. I worked around the problem by wrapping it in a subquery and
decomposing the row outside. It's already an obscure feature, and an easy workaround might be why
you haven't heard complaints before. I wouldn't mind writing a patch for that rule when I get a
chance (if no one else gets to it first.)
[1]: Actually I see it does `GROUP BY a, a.valid_at`, but that is surely more than I need. I think that `a.valid_at` is leftover from a previous version of the query.
that `a.valid_at` is leftover from a previous version of the query.
Yours,
--
Paul ~{:-)
pj@illuminatedcomputing.com
On 23.07.24 00:29, Tom Lane wrote:
(Personally, I'd wonder exactly what ALL is quantified over: the
whole output of the FROM clause, or only columns mentioned in the
SELECT tlist, or what? And why that choice rather than another?)
Looks like the main existing implementations take it to mean all entries
in the SELECT list that are not aggregate functions.
https://duckdb.org/docs/sql/query_syntax/groupby.html#group-by-all
https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-groupby.html#parameters
https://docs.snowflake.com/en/sql-reference/constructs/group-by#parameters
On Mon, 22 Jul 2024 at 22:55, David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.
+1 from me. When exploring data, this is extremely useful because you
don't have to update the GROUP BY clause every time
Regarding the arguments against this:
1. I don't think this is any more unreadable than being able to GROUP
BY 1, 2, 3. Or being able to use column aliases from the SELECT in the
GROUP BY clause. Again this is already allowed. Personally I actually
think it's more readable than specifying e.g. 5 columns in the group
by, because then I have to cross-reference with columns in the SELECT
clause to find out if they are the same. With ALL I instantly know
it's grouped by all
2. This is indeed not part of the standard. But we have many things
that are not part of the standard. I think as long as we use the same
syntax as snowflake, databricks and duckdb I personally don't see a
big problem. Then we could try and make this be part of the standard
in the next version of the standard.
On Tue, 23 Jul 2024 at 15:22, Andrei Borodin <x4mmm@yandex-team.ru> wrote:
I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora box of syntax sugar extensions which may will be incompatible with future standards.
If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT column and better GROUP BY.
Personally my number one enhancement would be allowing a trailing
comma after the last column in the SELECT clause.
Hi
st 24. 7. 2024 v 10:57 odesílatel Jelte Fennema-Nio <postgres@jeltef.nl>
napsal:
On Mon, 22 Jul 2024 at 22:55, David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.+1 from me. When exploring data, this is extremely useful because you
don't have to update the GROUP BY clause every timeRegarding the arguments against this:
1. I don't think this is any more unreadable than being able to GROUP
BY 1, 2, 3. Or being able to use column aliases from the SELECT in the
GROUP BY clause. Again this is already allowed. Personally I actually
think it's more readable than specifying e.g. 5 columns in the group
by, because then I have to cross-reference with columns in the SELECT
clause to find out if they are the same. With ALL I instantly know
it's grouped by all
2. This is indeed not part of the standard. But we have many things
that are not part of the standard. I think as long as we use the same
syntax as snowflake, databricks and duckdb I personally don't see a
big problem. Then we could try and make this be part of the standard
in the next version of the standard.
Aggregation against more columns is pretty slow and memory expensive in
Postgres.
DuckDB is an analytic database with different storage, different executors.
I like it very much, but I am not sure if I want to see these features in
Postgres.
Lot of developers are not very smart, and with proposed feature, then
instead to write correct and effective query, then they use just GROUP BY
ALL. Slow query should look like a slow query :-)
Regards
Pavel
On 24 Jul 2024, at 13:58, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
On Tue, 23 Jul 2024 at 15:22, Andrei Borodin <x4mmm@yandex-team.ru> wrote:
I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE ME). But I wouldn't like to open pandora box of syntax sugar extensions which may will be incompatible with future standards.
If we could have extensible grammar - I'd be happy to have a lot of such enhancements. My top 2 are FROM table SELECT column and better GROUP BY.Personally my number one enhancement would be allowing a trailing
comma after the last column in the SELECT clause.
Yes, trailing comma sounds great too.
One more similar syntax sugar I can think of. I see lots of queries like
SELECT somtheing
FROM table1
WHERE 1=1
and id = x
--and col1 = val1
and col2 = val2
I was wondering where does that "1=1" comes from. It's because developer comment condition one by one like "--, col1 = val1". And they do not want to cope with and\or continuation.
Best regards, Andrey Borodin.
PS. Seems like Mail.App mangled my previous message despite using plain text. It's totally broken in archives...
On Tue, Jul 23, 2024 at 6:53 PM David Christensen <david@pgguru.net> wrote:
On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:On Mon, Jul 22, 2024 at 1:55 PM David Christensen <david@pgguru.net> wrote:
I see that there'd been some chatter but not a lot of discussion about
a GROUP BY ALL feature/functionality. There certainly is utility in
such a construct IMHO.Still need some docs; just throwing this out there and getting some feedback.
I strongly dislike adding this feature. I'd only consider supporting it if it was part of the SQL standard.
Code is written once and read many times. This feature caters to the writer, not the reader. And furthermore usage of this is prone to be to the writer's detriment as well.
I'd say this feature (at least for me) caters to the investigator;
someone who is interactively looking at data hence why it would cater
to the writer. Consider acquainting yourself with a large table that
has a large number of annoying-named fields where you want to look at
how different data is correlated or broken-down. Something along the
lines of:
To me this looks like a feature that a data exploration tool may
implement instead of being part of the server. It would then provide
more statistics about each correlation/column set etc.
--
Best Wishes,
Ashutosh Bapat