Syntax bug? Group by?

Started by Mark Woodwardover 19 years ago26 messageshackers
Jump to latest
#1Mark Woodward
pgsql@mohawksoft.com

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR: column "y.ycis_id" must appear in the GROUP BY clause or be used
in an aggregate function

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

#2Stephen Frost
sfrost@snowman.net
In reply to: Mark Woodward (#1)
Re: Syntax bug? Group by?

* Mark Woodward (pgsql@mohawksoft.com) wrote:

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

Technically speaking, if you're asking for a specific tuple, should you
be allowed to request an aggregation?

Thanks,

Stephen

#3D'Arcy J.M. Cain
darcy@druid.net
In reply to: Stephen Frost (#2)
Re: Syntax bug? Group by?

On Tue, 17 Oct 2006 12:08:07 -0400
Stephen Frost <sfrost@snowman.net> wrote:

* Mark Woodward (pgsql@mohawksoft.com) wrote:

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

Technically speaking, if you're asking for a specific tuple, should you
be allowed to request an aggregation?

One column value doesn't necessarily mean one tuple unless it has a
unique index on that column.

SELECT COUNT(*) FROM table WHERE field = 'value';

That's perfectly reasonable. You don't need the GROUP BY clause.

However, this doesn't sound like a hackers question. Next time, please
ask on another list such as pgsql-sql or even pgsql-novice. You can
review the mailing lists and their purpose at
http://www.postgresql.org/community/lists/

-- 
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
#4Shane Ambler
pgsql@007Marketing.com
In reply to: Stephen Frost (#2)
Re: Syntax bug? Group by?

Stephen Frost wrote:

* Mark Woodward (pgsql@mohawksoft.com) wrote:

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

Technically speaking, if you're asking for a specific tuple, should you
be allowed to request an aggregation?

Only with the assumption that the value in the where clause is for a
unique column.

If you want min(col2) and avg(col2) where col1=x you can get it without
a group by, the same as if you put col1<x - if you want an aggregate of
all records returned not the aggregate based on each value of col1.

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.

--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

#5Mark Woodward
pgsql@mohawksoft.com
In reply to: Shane Ambler (#4)
Re: Syntax bug? Group by?

Stephen Frost wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.

I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

#6Markus Schaber
schabi@logix-tt.com
In reply to: Mark Woodward (#1)
Re: Syntax bug? Group by?

Hi, Mark,

Mark Woodward wrote:

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR: column "y.ycis_id" must appear in the GROUP BY clause or be used
in an aggregate function

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

Try:

SELECT 15 as ycis_id, min(tindex), avt(tindex) from y where ycis_id = 15;

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

#7Nolan Cafferky
Nolan.Cafferky@rbsinteractive.com
In reply to: Mark Woodward (#5)
Re: Syntax bug? Group by?

Mark Woodward wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

I think your point is that every non-aggregate column in the results of
the query also appears in the where clause and is given a single value
there, so conceivably, an all-knowing, all-powerful postgres could
recognize this and do the implied GROUP by on these columns.

I'm not in a position to give a definitive answer on this, but I suspect
that adjusting the query parser/planner to allow an implied GROUP BY
either gets prohibitively complicated, or fits too much of a special
case to be worth implementing.

select
ycis_id,
some_other_id,
min(tindex),
avg(tindex)
from
y
where
ycis_id = 15
group by
some_other_id;

Here, postgres would have to use the group by you specified, and also
recognize the single-valued constant assigned to ycis_id. Maybe not too
bad, but:

select
ycis_id,
some_other_id,
min(tindex),
avg(tindex)
from
y
where
ycis_id = some_single_valued_constant(foo, bar)
group by
some_other_id;

In this case, postgres doesn't know whether
some_single_valued_constant() will really return the same single value
for every tuple. Ultimately, as more complex queries are introduced, it
would become a lot simpler for the query writer to just specify the
group by columns instead of trying to guess it from the where clause.

Final note: I could also see situations where an implied group by would
silently allow a poorly written query to execute, instead of throwing an
error that suggests to the query writer that they did something wrong.

--
Nolan Cafferky
Software Developer
IT Department
RBS Interactive
nolan.cafferky@rbsinteractive.com

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Woodward (#5)
Re: Syntax bug? Group by?

Mark Woodward wrote:

Stephen Frost wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.

I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

AFAIK what you want is not per sql spec. What if you had instead written

select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id) = 15;

? I think you are expecting too much reasoning from the engine.

cheers

andrew

#9Markus Schaber
schabi@logix-tt.com
In reply to: Mark Woodward (#5)
Re: Syntax bug? Group by?

Hi, Mark,

Mark Woodward wrote:

Stephen Frost wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.

I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

I think that it's a lack of special-casing the = operator. Imagine
"where ycis_id>15" or "where ycis_id @|< $RECTANGLE" or other (probably
user defined) operators on (probably user defined) datatypes.

The parser has no real knowledge what the operators do, it simply
requests one that returns a bool.

One could make the parser to special case the = operator, and maybe some
others, however I doubt it's worth the effort.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

#10Mark Woodward
pgsql@mohawksoft.com
In reply to: Markus Schaber (#9)
Re: Syntax bug? Group by?

Hi, Mark,

Mark Woodward wrote:

Stephen Frost wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you
will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.

I still assert that there will always only be one row to this query.
This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it
should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

I think that it's a lack of special-casing the = operator. Imagine
"where ycis_id>15" or "where ycis_id @|< $RECTANGLE" or other (probably
user defined) operators on (probably user defined) datatypes.

The parser has no real knowledge what the operators do, it simply
requests one that returns a bool.

One could make the parser to special case the = operator, and maybe some
others, however I doubt it's worth the effort.

I understand the SQL, and this isn't a "sql" question else it would be on
a different list, it is a PostgreSQL internals question and IMHO potential
bug.

The original query:
select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

Should NOT require a "group by" to get ycis_id in the results.

#11Mark Woodward
pgsql@mohawksoft.com
In reply to: Markus Schaber (#6)
Re: Syntax bug? Group by?

Hi, Mark,

Mark Woodward wrote:

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR: column "y.ycis_id" must appear in the GROUP BY clause or be used
in an aggregate function

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

Try:

SELECT 15 as ycis_id, min(tindex), avt(tindex) from y where ycis_id = 15;

This isn't a "SQL" question!!! This is a question of whether or not
PostgreSQL is correct in requiring a "group by" in the query. I assert
that since it is unabiguous as to what "ycis_id" should be, PostgreSQL
should not require a grouping.

#12Mark Woodward
pgsql@mohawksoft.com
In reply to: Andrew Dunstan (#8)
Re: Syntax bug? Group by?

Mark Woodward wrote:

Stephen Frost wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you
will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.

I still assert that there will always only be one row to this query.
This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it
should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

AFAIK what you want is not per sql spec. What if you had instead written

select ycis_id, min(tindex), avg(tindex) from y where frobnitz(ycis_id)
= 15;

? I think you are expecting too much reasoning from the engine.

Regardless, I can get the results I need and have already worked around
this. The reason why I posted the question to hackers was that I think it
is a bug.

The output column "ycis_id" is unabiguously a single value with regards to
the query. Shouldn't PostgreSQL "know" this? AFAIR, I think I've used this
exact type of query before either on PostgreSQL or another system, maybe
Oracle, and it did work.

#13Mark Woodward
pgsql@mohawksoft.com
In reply to: Nolan Cafferky (#7)
Re: Syntax bug? Group by?

Mark Woodward wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

I still assert that there will always only be one row to this query.
This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it
should
not need to be grouped.

My question, is it a syntactic technicality that PostgreSQL asks for a
"group by," or a bug in the parser?

I think your point is that every non-aggregate column in the results of
the query also appears in the where clause and is given a single value
there, so conceivably, an all-knowing, all-powerful postgres could
recognize this and do the implied GROUP by on these columns.

Not exactly.

I'm not in a position to give a definitive answer on this, but I suspect
that adjusting the query parser/planner to allow an implied GROUP BY
either gets prohibitively complicated, or fits too much of a special
case to be worth implementing.

select
ycis_id,
some_other_id,
min(tindex),
avg(tindex)
from
y
where
ycis_id = 15
group by
some_other_id;

This is not, in fact, like the example I gave and confuses the point I am
trying to make.

The original query:
select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ycis_id is unambiguous and MUST be only one value, there should be no
requirement of grouping. In fact, a "group by" implies multiple result
rows in an aggregate query.

As I said in other branches of this thread, this isn't a SQL question, it
is a question of whether or not the PostgreSQL parser is correct or not,
and I do not believe that it is working correctly.

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Woodward (#11)
Re: Syntax bug? Group by?

Mark Woodward wrote:

Hi, Mark,

Mark Woodward wrote:

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR: column "y.ycis_id" must appear in the GROUP BY clause or be used
in an aggregate function

If I am asking for a specific column value, should I, technically
speaking, need to group by that column?

Try:

SELECT 15 as ycis_id, min(tindex), avt(tindex) from y where ycis_id = 15;

This isn't a "SQL" question!!! This is a question of whether or not
PostgreSQL is correct in requiring a "group by" in the query. I assert
that since it is unabiguous as to what "ycis_id" should be, PostgreSQL
should not require a grouping.

Of course it's an SQL question. How can you ask about the correctness of
a piece of text which purports to be SQL and then say it isn't an SQL
question?

If you can point to a place in the spec or our docs that sanctions the
usage you expect, then please do so, Until then I (and I suspect
everyone else) will persist in saying it's not a bug.

cheers

andrew

#15Shane Ambler
pgsql@007Marketing.com
In reply to: Mark Woodward (#5)
Re: Syntax bug? Group by?

Mark Woodward wrote:

Stephen Frost wrote:

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

But back to the query the issue comes in that the ycis_id value is
included with the return values requested (a single row value with
aggregate values that isn't grouped) - if ycis_id is not unique you will
get x number of returned tuples with ycis_id=15 and the same min() and
avg() values for each row.
Removing the ycis_id after the select will return the aggregate values
you want without the group by.

I still assert that there will always only be one row to this query. This
is an aggregate query, so all the rows with ycis_id = 15, will be
aggregated. Since ycis_id is the identifying part of the query, it should
not need to be grouped.

SELECT ycis_id FROM table WHERE ycis_id=15; returns a single tuple when
ycis_id is unique otherwise multiple tuples
which means that SELECT ycis_id is technically defined as returning a
multiple row tuple even if ycis_id is unique - the data in the tuple
returned is data directly from one table row

SELECT max(col2) FROM table WHERE ycis_id=15; returns an aggregate tuple

SELECT ycis_id FROM table WHERE ycis_id=15 GROUP BY ycis_id; returns an
aggregate tuple (aggregated with the GROUP BY clause making the ycis_id
after the SELECT an aggregate as well)

You can't have both a single tuple and an aggregate tuple returned in
the one statement. If you want the column value of ycis_id in the
results you need the group by to unify all returned results as being
aggregates.

--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Mark Woodward (#1)
Re: Syntax bug? Group by?

Mark Woodward wrote:

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR: column "y.ycis_id" must appear in the GROUP BY clause or be
used in an aggregate function

This would require a great deal of special-casing, in particular
knowledge of the = operator, and then the restriction to a particular
form of the WHERE clause. For overall consistency, I don't think this
should be allowed.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#17Chris Campbell
chris@bignerdranch.com
In reply to: Peter Eisentraut (#16)
Re: Syntax bug? Group by?

On Oct 17, 2006, at 15:19, Peter Eisentraut wrote:

Mark Woodward wrote:

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR: column "y.ycis_id" must appear in the GROUP BY clause or be
used in an aggregate function

This would require a great deal of special-casing, in particular
knowledge of the = operator, and then the restriction to a particular
form of the WHERE clause. For overall consistency, I don't think this
should be allowed.

In this particular case, the client constructing the query *knows*
the value of ycis_id (since the client is generating the "ycis_id =
15" clause). It's technically just a waste of bandwidth and server
resources to recalculate it. If you really want to replicate the
output of the query you proposed, you could rewrite it on the client as:

select 15 as ycis_id, min(tindex), avg(tindex) from y where
ycis_id = 15;

You could argue that the server should do this for you, but it seems
ugly to do in the general case. And, like Peter points out, would
need a lot of special-casing. I guess the parser could do it for
expressions in the SELECT clause that exactly match expressions in
the WHERE clause.

Thanks!

- Chris

#18Joe Sunday
sunday@csh.rit.edu
In reply to: Mark Woodward (#12)
Re: Syntax bug? Group by?

On Tue, Oct 17, 2006 at 02:41:25PM -0400, Mark Woodward wrote:

The output column "ycis_id" is unabiguously a single value with regards to
the query. Shouldn't PostgreSQL "know" this? AFAIR, I think I've used this
exact type of query before either on PostgreSQL or another system, maybe
Oracle, and it did work.

Doesn't work in Oracle 10g:

SELECT ycis_id, tindex from x where ycis_id = 15;
YCIS_ID TINDEX
======= ======
15 10
15 20

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15;
ORA-00937: not a single-group group function

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15 GROUP BY ycis_id;
YCIS_ID MIN(TINDEX) AVG(TINDEX)
======= =========== ===========
15 10 15

--Joe

--
Joe Sunday <sunday@csh.rit.edu> http://www.csh.rit.edu/~sunday/
Computer Science House, Rochester Inst. Of Technology

#19Mark Woodward
pgsql@mohawksoft.com
In reply to: Joe Sunday (#18)
Re: Syntax bug? Group by?

On Tue, Oct 17, 2006 at 02:41:25PM -0400, Mark Woodward wrote:

The output column "ycis_id" is unabiguously a single value with regards
to
the query. Shouldn't PostgreSQL "know" this? AFAIR, I think I've used
this
exact type of query before either on PostgreSQL or another system, maybe
Oracle, and it did work.

Doesn't work in Oracle 10g:

SELECT ycis_id, tindex from x where ycis_id = 15;
YCIS_ID TINDEX
======= ======
15 10
15 20

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15;
ORA-00937: not a single-group group function

SELECT ycis_id, min(tindex), avg(tindex) from x where ycis_id = 15 GROUP
BY ycis_id;
YCIS_ID MIN(TINDEX) AVG(TINDEX)
======= =========== ===========
15 10 15

That's interesting. I am digging through the SQL99 spec, and am trying to
find a definitive answer.

#20Mark Woodward
pgsql@mohawksoft.com
In reply to: Peter Eisentraut (#16)
Re: Syntax bug? Group by?

Mark Woodward wrote:

Shouldn't this work?

select ycis_id, min(tindex), avg(tindex) from y where ycis_id = 15;

ERROR: column "y.ycis_id" must appear in the GROUP BY clause or be
used in an aggregate function

This would require a great deal of special-casing, in particular
knowledge of the = operator, and then the restriction to a particular
form of the WHERE clause. For overall consistency, I don't think this
should be allowed.

Well, this started out as a "huh, that's funny, that should work, is that
a bug?" and is turning into a search through the SQL99 spec for a clear
answer. I've already worked around it, but to me, at least, it seems it
should work.

#21Mark Woodward
pgsql@mohawksoft.com
In reply to: Chris Campbell (#17)
#22Martijn van Oosterhout
kleptog@svana.org
In reply to: Mark Woodward (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#8)
#24Shane Ambler
pgsql@007Marketing.com
In reply to: Joe Sunday (#18)
#25Krycek
krycek6@wp.pl
In reply to: Mark Woodward (#1)
#26Markus Schaber
schabi@logix-tt.com
In reply to: Mark Woodward (#10)