group by true now errors with non-integer constant in GROUP BY
Hi Folks
I'm trying to upgrade our development environment from 13.11 to 15.4 as we
look forward to starting to use merge and a few other new features.
The only error that we have with our pgtap tests with is:
ERROR: non-integer constant in GROUP BY
It is only happening to a particular "group by true" that is dynamically
compiled when the function parameter asks to group by all data instead of
grouping other time series data.
I wrote the following script to reproduce also with an example of what we
do with the group by when the parameter is not all.:
#!/usr/bin/env bash
{
for version in "13.11" "14.9" "15.4" "15.0" "15.1" "15.0" "15.1" "15.2"
"15.3"; do #
docker rm -f postgres || true
echo "Testing postgres:$version"
docker run --rm --name postgres --net host -ePOSTGRES_USER=postgres -e
POSTGRES_PASSWORD=mysecretpassword -e PGPORT=54321 -d postgres:$version
timeout 90s /usr/bin/env bash -c "until docker exec postgres pg_isready ;
do sleep 5 ; done" # wait for db to be ready
psql -v ON_ERROR_STOP=on
postgresql://postgres:mysecretpassword@localhost:54321/postgres
<<EOF
CREATE TABLE IF NOT EXISTS test_data (id serial, proccess_time timestamp
with time zone, value NUMERIC);
INSERT INTO test_data(proccess_time, value)
SELECT test_time, random() * 100
FROM generate_series(now() - interval '30 days', now() + interval '30
days', INTERVAL '30 MIN') d(test_time);
SELECT (array_agg(proccess_time order by proccess_time asc))[1], avg(value)
FROM test_data GROUP BY date_part('week' , proccess_time); -- working
example
SELECT (array_agg(proccess_time order by proccess_time asc))[1], avg(value)
FROM test_data GROUP BY true;
EOF
done
}
The issues appears after 15.0
Thanks
--
Email: david.j.micallef@gmail.com
On Mon, Aug 28, 2023 at 1:28 PM David Micallef <david.j.micallef@gmail.com>
wrote:
The only error that we have with our pgtap tests with is:
ERROR: non-integer constant in GROUP BYIt is only happening to a particular "group by true" that is dynamically
compiled when the function parameter asks to group by all data instead of
grouping other time series data.
The issues appears after 15.0
I can confirm the reproducer script fails with commit 941460fcf731a ("Add
Boolean node"). (So actually all 15.x were affected.)
The attached fixes it for me (applies onto v15). It looks simple enough,
but doesn't comfortably fit so might need some tweaking (needs tests, too).
--
John Naylor
EDB: http://www.enterprisedb.com
Attachments:
fix-group-by-bool.patchtext/x-patch; charset=US-ASCII; name=fix-group-by-bool.patchDownload+4-0
John Naylor <john.naylor@enterprisedb.com> writes:
On Mon, Aug 28, 2023 at 1:28 PM David Micallef <david.j.micallef@gmail.com>
wrote:ERROR: non-integer constant in GROUP BY
I can confirm the reproducer script fails with commit 941460fcf731a ("Add
Boolean node"). (So actually all 15.x were affected.)
Yeah, same result from bisecting here. I had had the idea that this
was an intentional semantics change to reduce the probability of error,
but that commit didn't mention any such thing, so that's a tough claim
to make as far as the historical intent goes.
Having said that ... it seems to me that it was pure coincidence that
"group by true" worked before, and I'm not sure we should install a
grotty hack to make it work again. In particular, why should we allow
Boolean Consts but not other non-integer Consts? (And if we do, don't
we need to change that error message?)
The bigger picture here is: what is the use-case for grouping by a
constant at all? Assuming that it is an error seems like a good
foolproofing restriction. The reason we felt we could keep the
"group by N" SQL92-ism after SQL99 redefined GROUP BY arguments is
exactly that there's no obvious use-case for grouping by a constant.
As soon as you allow it, "group by N" becomes hopelessly ambiguous.
So my druthers would be to reject this as a non-bug. But if we accept
it as something to fix, we need to revisit exactly which conditions
are errors here. Perhaps rather than "reject all non-integer cases",
we should only reject the Float case, and let others fall through to
the SQL99 code. (I would not be happy allowing Float, because that'd
mean that "group by 4" and "group by 4.0" mean fundamentally different
things.)
regards, tom lane
On 2023-08-28 Mo 15:56, Tom Lane wrote:
The bigger picture here is: what is the use-case for grouping by a
constant at all? Assuming that it is an error seems like a good
foolproofing restriction. The reason we felt we could keep the
"group by N" SQL92-ism after SQL99 redefined GROUP BY arguments is
exactly that there's no obvious use-case for grouping by a constant.
As soon as you allow it, "group by N" becomes hopelessly ambiguous.So my druthers would be to reject this as a non-bug. But if we accept
it as something to fix, we need to revisit exactly which conditions
are errors here. Perhaps rather than "reject all non-integer cases",
we should only reject the Float case, and let others fall through to
the SQL99 code. (I would not be happy allowing Float, because that'd
mean that "group by 4" and "group by 4.0" mean fundamentally different
things.)
I vote for treating it as a non-bug.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
On Tue, 29 Aug 2023 at 07:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Having said that ... it seems to me that it was pure coincidence that
"group by true" worked before, and I'm not sure we should install a
grotty hack to make it work again. In particular, why should we allow
Boolean Consts but not other non-integer Consts? (And if we do, don't
we need to change that error message?)
Is it really a grotty hack? Isn't it just as simple as the attached?
The bigger picture here is: what is the use-case for grouping by a
constant at all? Assuming that it is an error seems like a good
foolproofing restriction. The reason we felt we could keep the
"group by N" SQL92-ism after SQL99 redefined GROUP BY arguments is
exactly that there's no obvious use-case for grouping by a constant.
As soon as you allow it, "group by N" becomes hopelessly ambiguous.
The new behaviour feels a bit inconsistent to me as it stands today.
I can't write GROUP BY true, but I can write GROUP BY 1=1, which gets
it beyond the parser and allows constant folding to turn it into GROUP
BY true, which I couldn't specify because the parser would complain.
So my druthers would be to reject this as a non-bug. But if we accept
it as something to fix, we need to revisit exactly which conditions
are errors here. Perhaps rather than "reject all non-integer cases",
we should only reject the Float case, and let others fall through to
the SQL99 code. (I would not be happy allowing Float, because that'd
mean that "group by 4" and "group by 4.0" mean fundamentally different
things.)
I had a look on dbfiddle and I see that MySQL 8.0 and SQLlite all
allow GROUP BY true. I think if we used to, and those other databases
do, then we might want to reconsider supporting it again, especially
so now that someone has complained. I'm assuming it's just as simple
as the attached patch, but I'm happy to listen if I've underestimated
the complexity.
David
Attachments:
grotty_hack.patchapplication/octet-stream; name=grotty_hack.patchDownload+1-9
David Rowley <dgrowleyml@gmail.com> writes:
On Tue, 29 Aug 2023 at 07:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The bigger picture here is: what is the use-case for grouping by a
constant at all? Assuming that it is an error seems like a good
foolproofing restriction. The reason we felt we could keep the
"group by N" SQL92-ism after SQL99 redefined GROUP BY arguments is
exactly that there's no obvious use-case for grouping by a constant.
As soon as you allow it, "group by N" becomes hopelessly ambiguous.
The new behaviour feels a bit inconsistent to me as it stands today.
I can't write GROUP BY true, but I can write GROUP BY 1=1, which gets
it beyond the parser and allows constant folding to turn it into GROUP
BY true, which I couldn't specify because the parser would complain.
Sure, you can write any constant expression, for instance NULL::bool
would work. The question is where do we draw the line between SQL92
and SQL99 behaviors. I think "an undecorated constant is SQL92, and
therefore it must be an integer matching an output column number" is
a nice simple rule. The fact that TRUE and FALSE were not previously
treated as undecorated constants is an unintentional wart of the old
implementation, not something we ought to preserve.
I had a look on dbfiddle and I see that MySQL 8.0 and SQLlite all
allow GROUP BY true.
What do they do with GROUP BY 1, or GROUP BY 10000, or GROUP BY 1.0 ?
I think if we used to, and those other databases
do, then we might want to reconsider supporting it again, especially
so now that someone has complained. I'm assuming it's just as simple
as the attached patch, but I'm happy to listen if I've underestimated
the complexity.
Sure, changing the behavior is trivial. Whether we *should* change
the behavior, and if so to what, is less so. I'm not really on
board with "we need to maintain bug-compatibility with an old
implementation artifact".
BTW, I poked around and couldn't find anything explaining this
fine point in the SGML docs, although the comments in
findTargetlistEntrySQL92 are clear about it. If we do anything
at all here, I think that ought to include documenting the behavior
more clearly (and I'm curious to see how you'd propose to explain
the behavior you want to users).
regards, tom lane
On Tue, 29 Aug 2023 at 13:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
I had a look on dbfiddle and I see that MySQL 8.0 and SQLlite all
allow GROUP BY true.What do they do with GROUP BY 1, or GROUP BY 10000, or GROUP BY 1.0 ?
All treat only integer constants as column references. Out-of-range
integer values are reported as errors. Other const types appear to be
treated as expressions rather than column references.
BTW, I poked around and couldn't find anything explaining this
fine point in the SGML docs, although the comments in
findTargetlistEntrySQL92 are clear about it. If we do anything
at all here, I think that ought to include documenting the behavior
more clearly (and I'm curious to see how you'd propose to explain
the behavior you want to users).
The rule and how to explain it seems fairly simple to me. Integer
constants are treated as column references to their corresponding
1-based position in the SELECT clause. Anything else is treated as an
expression.
David
On Tue, Aug 29, 2023 at 8:55 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 29 Aug 2023 at 13:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, I poked around and couldn't find anything explaining this
fine point in the SGML docs, although the comments in
findTargetlistEntrySQL92 are clear about it. If we do anything
at all here, I think that ought to include documenting the behavior
more clearly (and I'm curious to see how you'd propose to explain
the behavior you want to users).The rule and how to explain it seems fairly simple to me. Integer
constants are treated as column references to their corresponding
1-based position in the SELECT clause. Anything else is treated as an
expression.
Seems reasonable to me.
--
John Naylor
EDB: http://www.enterprisedb.com
On Tue, 29 Aug 2023 at 18:39, John Naylor <john.naylor@enterprisedb.com> wrote:
On Tue, Aug 29, 2023 at 8:55 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 29 Aug 2023 at 13:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, I poked around and couldn't find anything explaining this
fine point in the SGML docs, although the comments in
findTargetlistEntrySQL92 are clear about it. If we do anything
at all here, I think that ought to include documenting the behavior
more clearly (and I'm curious to see how you'd propose to explain
the behavior you want to users).The rule and how to explain it seems fairly simple to me. Integer
constants are treated as column references to their corresponding
1-based position in the SELECT clause. Anything else is treated as an
expression.Seems reasonable to me.
Here's how I think we should proceed:
1. Re-allow Boolean constants in GROUP BY in v15 and v16 by
backpatching John's fix to special case Booleans.
2. In master only, remove the special case added in #1 and allow any
non-integer constants to be treated as expressions.
I think #2 is a good move for the following reasons:
a) Seems consistent with other RDBMSs (See [1]/messages/by-id/CAApHDvomA1bZy=0AYUcTjDWaCeedcPeDBo6PV0VhpVeo2jG1uQ@mail.gmail.com).
b) Gets rid of the special case added in #1 to allow booleans
c) Consistent with things like "JOIN ... ON true".
d) May allow simplified coding in ORMs. Without a GROUP BY clause,
you're at the mercy of there being any aggregate functions in the
target list to define the grouping behaviour.
I've included 2 patches, 0001 for #1 (John's patch with the comment
adjusted to explain the special case) and 0002 for #2.
Does anyone think we should do this differently?
David
[1]: /messages/by-id/CAApHDvomA1bZy=0AYUcTjDWaCeedcPeDBo6PV0VhpVeo2jG1uQ@mail.gmail.com
Attachments:
v1-0001-Re-allow-Boolean-constants-in-GROUP-BY.patchapplication/octet-stream; name=v1-0001-Re-allow-Boolean-constants-in-GROUP-BY.patchDownload+8-1
v1-0002-Allow-any-constant-types-in-GROUP-BY-clause.patchapplication/octet-stream; name=v1-0002-Allow-any-constant-types-in-GROUP-BY-clause.patchDownload+26-35
David Rowley <dgrowleyml@gmail.com> writes:
2. In master only, remove the special case added in #1 and allow any
non-integer constants to be treated as expressions.
I think #2 is a good move for the following reasons:
FTR, I still think this is a bad idea. It will add more confusion
than it removes, and I don't buy that it will confer any advantages,
because nobody asked for it previously.
However, assuming that I'm going to be out-voted: your docs changes
still need work. That phrasing makes it sound like an output column
name could be "expressed as an integer literal". Maybe we should
restructure the whole sentence, perhaps along the lines of
... class="parameter">expression</replaceable> used inside a
<replaceable class="parameter">grouping_element</replaceable>
can be an input column name or an arbitrary expression formed
from input column values. For backwards compatibility with
SQL-92, the <replaceable>expression</replaceable> can also be
an output column name or an integer literal representing the
ordinal number of an output column. In case of ambiguity, a
<literal>GROUP BY</literal> item that is just a name will be
interpreted as an input column name rather than an output column
name.
Neither formulation addresses the case of non-integer literals
head on. I'm not quite sure if it's worth adding another sentence
to do so.
regards, tom lane
On Mon, 18 Sept 2023 at 12:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
2. In master only, remove the special case added in #1 and allow any
non-integer constants to be treated as expressions.I think #2 is a good move for the following reasons:
FTR, I still think this is a bad idea. It will add more confusion
than it removes, and I don't buy that it will confer any advantages,
because nobody asked for it previously.
I'm not dead set on it. I just don't think we can do exactly nothing
about this. At the very least we need to mention in REL_15_STABLE's
release-15.sgml in the incompatibilities section.
Another reason for #2 which I forgot to add was that it gets rid of
the need to have the "non-integer constant in ..." error message and
saves about 8 lines of code and a string constant table entry.
I'll happily wait to see if anyone else has any thoughts on this. The
votes seem roughly 2 vs 2 at the moment.
David
On Sun, Sep 17, 2023 at 5:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
2. In master only, remove the special case added in #1 and allow any
non-integer constants to be treated as expressions.I think #2 is a good move for the following reasons:
FTR, I still think this is a bad idea. It will add more confusion
than it removes, and I don't buy that it will confer any advantages,
because nobody asked for it previously.
I agree. Maintaining bug compatibility doesn't seem worth it in this instance.
David Micallef mentioned that only one query was affected. You have to
draw the line somewhere.
--
Peter Geoghegan
On Thu, 21 Sept 2023 at 16:58, Peter Geoghegan <pg@bowt.ie> wrote:
On Sun, Sep 17, 2023 at 5:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
FTR, I still think this is a bad idea. It will add more confusion
than it removes, and I don't buy that it will confer any advantages,
because nobody asked for it previously.I agree. Maintaining bug compatibility doesn't seem worth it in this instance.
Thanks for chipping in.
Just for the record, I'm more keen on removing the special case that
disallows this. For me it's more about POLA. I expect that anywhere
I can write an expression in SQL, that I can put a Const in its place
and not receive an error to say constants are disallowed.
I mentioned this to Andres in a meeting yesterday and it seems to be
against allowing Consts, so I seem to be losing this one.
David
Dear all,
I would like to add something to the question: "why grouping on a constant
would be necessary", even if it is plain wrong.
Background info: I stumbled upon this thread while upgrading our database
from postgresql-14 to postgresql-16 making use of a Java application using
of JPA and Hibernate. Hibernate is a widely used framework and this library
will compose queries (under certain conditions (still unknown to me))
with *GROUP
BY coulmn1, column2, true* *<-- *
Hibernate has been doing this quircky thing for many many years and even in
the latest release does so. So, potentionally many Java Enterprise
applications will be tied to postgresql-14 if there is no compatibility
switch possible.
In our case a tiny compatability switch would be a livesaver. Of courcse, I
will try to convince the Hibernate community to have this unnecessary
constant removed but that still leaves all legacy code to not work with
postgresql-15+ databases which would be pitiful!
Kind regards,
Dennis Brouwer
On Thu, 19 Oct 2023 at 13:44, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
John Naylor <john.naylor@enterprisedb.com> writes:
On Mon, Aug 28, 2023 at 1:28 PM David Micallef <
david.j.micallef@gmail.com>
wrote:
ERROR: non-integer constant in GROUP BY
I can confirm the reproducer script fails with commit 941460fcf731a ("Add
Boolean node"). (So actually all 15.x were affected.)Yeah, same result from bisecting here. I had had the idea that this
was an intentional semantics change to reduce the probability of error,
but that commit didn't mention any such thing, so that's a tough claim
to make as far as the historical intent goes.Having said that ... it seems to me that it was pure coincidence that
"group by true" worked before, and I'm not sure we should install a
grotty hack to make it work again. In particular, why should we allow
Boolean Consts but not other non-integer Consts? (And if we do, don't
we need to change that error message?)The bigger picture here is: what is the use-case for grouping by a
constant at all? Assuming that it is an error seems like a good
foolproofing restriction. The reason we felt we could keep the
"group by N" SQL92-ism after SQL99 redefined GROUP BY arguments is
exactly that there's no obvious use-case for grouping by a constant.
As soon as you allow it, "group by N" becomes hopelessly ambiguous.So my druthers would be to reject this as a non-bug. But if we accept
it as something to fix, we need to revisit exactly which conditions
are errors here. Perhaps rather than "reject all non-integer cases",
we should only reject the Float case, and let others fall through to
the SQL99 code. (I would not be happy allowing Float, because that'd
mean that "group by 4" and "group by 4.0" mean fundamentally different
things.)regards, tom lane
On Thu, 2023-10-19 at 14:07 +0200, Dennis Brouwer wrote:
Hibernate is a widely used framework and this library will compose queries
(under certain conditions (still unknown to me))
with GROUP BY coulmn1, column2, true <--Hibernate has been doing this quircky thing for many many years and even
in the latest release does so. So, potentionally many Java Enterprise
applications will be tied to postgresql-14 if there is no compatibility
switch possible.In our case a tiny compatability switch would be a livesaver. Of courcse,
I will try to convince the Hibernate community to have this unnecessary
constant removed but that still leaves all legacy code to not work with
postgresql-15+ databases which would be pitiful!
I understand your pain.
But according to my reading of the SQL standard, it only allows for
regular column references in GROUP BY. Unless I got something wrong,
that would mean the the ball is in Hibernate's court. It ought to
produce correct SQL.
Yours,
Laurenz Albe
On Thursday, October 19, 2023, Laurenz Albe <laurenz.albe@cybertec.at>
wrote:
On Thu, 2023-10-19 at 14:07 +0200, Dennis Brouwer wrote:
Hibernate is a widely used framework and this library will compose
queries
(under certain conditions (still unknown to me))
with GROUP BY coulmn1, column2, true <--Hibernate has been doing this quircky thing for many many years and even
in the latest release does so. So, potentionally many Java Enterprise
applications will be tied to postgresql-14 if there is no compatibility
switch possible.In our case a tiny compatability switch would be a livesaver. Of courcse,
I will try to convince the Hibernate community to have this unnecessary
constant removed but that still leaves all legacy code to not work with
postgresql-15+ databases which would be pitiful!I understand your pain.
But according to my reading of the SQL standard, it only allows for
regular column references in GROUP BY. Unless I got something wrong,
that would mean the the ball is in Hibernate's court. It ought to
produce correct SQL.
My takeaway is that if Hibernate is able to produce this and get away with
it in the various jdbc drivers it needs to work with then our existing
choice to extend beyond the standard seems justified as others must be
doing it as well. Introducing a regression now on the basis of standard
conformance is just going to harm our reputation with minimal benefit, all
seemingly accrued by developers, not users.
David J.
On 10/19/23 15:18, Laurenz Albe wrote:
On Thu, 2023-10-19 at 14:07 +0200, Dennis Brouwer wrote:
Hibernate is a widely used framework and this library will compose queries
(under certain conditions (still unknown to me))
with GROUP BY coulmn1, column2, true <--Hibernate has been doing this quircky thing for many many years and even
in the latest release does so. So, potentionally many Java Enterprise
applications will be tied to postgresql-14 if there is no compatibility
switch possible.In our case a tiny compatability switch would be a livesaver. Of courcse,
I will try to convince the Hibernate community to have this unnecessary
constant removed but that still leaves all legacy code to not work with
postgresql-15+ databases which would be pitiful!I understand your pain.
But according to my reading of the SQL standard, it only allows for
regular column references in GROUP BY. Unless I got something wrong,
that would mean the the ball is in Hibernate's court. It ought to
produce correct SQL.
The answer is not as easy as that. It is true that the standard
requires a <column reference> for each element of the GROUP BY clause,
but it is also true that PostgreSQL allows arbitrary expressions.
Why is this non-standard query allowed:
vik=# select true group by true or true;
?column?
----------
t
(1 row)
but not this one?
vik=# select true group by true;
ERROR: non-integer constant in GROUP BY
LINE 1: select true group by true;
^
I may have oversimplified this example, but as long as the value is
present in the SELECT, logic would dictate that we can group by it.
The correct thing to do would be to *at least* get rid of the horrible
monstrosity that is "GROUP BY 1", but that is probably never going to
happen. This syntax was never part of the standard, and the "ORDER BY
1" syntax it was calqued upon was ripped out in SQL-99.
--
Vik Fearing
On 10/20/23 07:44, Vik Fearing wrote:
I may have oversimplified this example, but as long as the value is
present in the SELECT, logic would dictate that we can group by it.
Oops, I have this backwards: the projection comes after the grouping.
--
Vik Fearing
On Fri, 2023-10-20 at 07:44 +0200, Vik Fearing wrote:
But according to my reading of the SQL standard, it only allows for
regular column references in GROUP BY. Unless I got something wrong,
that would mean the the ball is in Hibernate's court. It ought to
produce correct SQL.The answer is not as easy as that. It is true that the standard
requires a <column reference> for each element of the GROUP BY clause,
but it is also true that PostgreSQL allows arbitrary expressions.Why is this non-standard query allowed:
vik=# select true group by true or true;
?column?
----------
t
(1 row)but not this one?
vik=# select true group by true;
ERROR: non-integer constant in GROUP BY
LINE 1: select true group by true;
^I may have oversimplified this example, but as long as the value is
present in the SELECT, logic would dictate that we can group by it.
I agree that it is desirable to fix this regression.
Still, we shouldn't exonerate Hibernate from fixing the junk SQL
it produces.
Yours,
Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes:
I agree that it is desirable to fix this regression.
It's not a regression. If anything, it's a bug fix, because the
code is now doing what it intended to do all along: reject all
non-integral constants. The previous behavior was exposing an
implementation detail, namely that "true" wasn't a simple literal
constant according to the older grammar. But all of these were
and still are rejected:
GROUP BY 'true';
GROUP BY 1.0;
GROUP BY null;
I'm not really satisfied with concluding that we need to be
bug-compatible (literally) with an old implementation wart forever.
Still, we shouldn't exonerate Hibernate from fixing the junk SQL
it produces.
I'm curious as to how this incompatibility escaped notice for a full
year. Does Hibernate emit this SQL only in narrow corner cases?
regards, tom lane