Why does "group by" need to match select fields?

Started by Omar Eljumailyabout 19 years ago7 messagesgeneral

omar2@omnicode.com

about 19 years ago

Sorry if this isn't exactly postgresql specific. I periodically run
into this problem, and I'm running into it now. I'm wondering if
there's something about "group by" that I don't understand. As an
example what I'd want to do is return the "id" value for the check to
each payee that has the highest amount. It seems like there's no
problem with ambiguity in logic, but postgresql + other sql servers balk
at it. The group by fields need to explicitly match the select fields
with the exception of the aggregate function(s?).

create table checks
{
id serial,
payee text,
amount double
};

select max(amount), payee, id from checks group by payee;

Why won't the above work? Is there another way to get the id for the
record with the highest amount for each payee?

Thanks.

wmoran@collaborativefusion.com

about 19 years ago

In reply to: Omar Eljumaily (#1)

Re: Why does "group by" need to match select fields?

Omar Eljumaily <omar2@omnicode.com> wrote:

Sorry if this isn't exactly postgresql specific. I periodically run
into this problem, and I'm running into it now. I'm wondering if
there's something about "group by" that I don't understand. As an
example what I'd want to do is return the "id" value for the check to
each payee that has the highest amount. It seems like there's no
problem with ambiguity in logic, but postgresql + other sql servers balk
at it. The group by fields need to explicitly match the select fields
with the exception of the aggregate function(s?).

create table checks
{
id serial,
payee text,
amount double
};

select max(amount), payee, id from checks group by payee;

Why won't the above work? Is there another way to get the id for the
record with the highest amount for each payee?

Because it's ambiguous. If you're grabbing max() for amount, which
id tuple do you want?

Perhaps the way you're storing your data, those answers aren't ambiguous,
but the database doesn't know that. Take this query as an example:

select max(amount), max(checknumber), payee from checks group by payee;

In that case, the highest checknumber and the highest check amount
probably won't come from the same tuple. If you were to throw in
there:

select max(amount), max(checknumber), payee, id from checks group by payee;

Which id does it give you? The one that matches max(amount) or the one
that matches max(checknumber)?

--
Bill Moran
Collaborative Fusion Inc.

omar2@omnicode.com

about 19 years ago

In reply to: Bill Moran (#2)

Re: Why does "group by" need to match select fields?

OK, I see what's going on. I can have more than one max(amount) with
the same amount and payee. Thanks so much. Like I said, it's sort of
dogged me off and on many times.

Thanks.

Bill Moran wrote:

Show quoted text

Omar Eljumaily <omar2@omnicode.com> wrote:

Sorry if this isn't exactly postgresql specific. I periodically run
into this problem, and I'm running into it now. I'm wondering if
there's something about "group by" that I don't understand. As an
example what I'd want to do is return the "id" value for the check to
each payee that has the highest amount. It seems like there's no
problem with ambiguity in logic, but postgresql + other sql servers balk
at it. The group by fields need to explicitly match the select fields
with the exception of the aggregate function(s?).

create table checks
{
id serial,
payee text,
amount double
};

select max(amount), payee, id from checks group by payee;

Why won't the above work? Is there another way to get the id for the
record with the highest amount for each payee?

Because it's ambiguous. If you're grabbing max() for amount, which
id tuple do you want?

Perhaps the way you're storing your data, those answers aren't ambiguous,
but the database doesn't know that. Take this query as an example:

select max(amount), max(checknumber), payee from checks group by payee;

In that case, the highest checknumber and the highest check amount
probably won't come from the same tuple. If you were to throw in
there:

select max(amount), max(checknumber), payee, id from checks group by payee;

Which id does it give you? The one that matches max(amount) or the one
that matches max(checknumber)?

erik@myemma.com

about 19 years ago

In reply to: Omar Eljumaily (#3)

Re: Why does "group by" need to match select fields?

On Feb 28, 2007, at 6:57 PM, Omar Eljumaily wrote:

OK, I see what's going on. I can have more than one max(amount)
with the same amount and payee. Thanks so much. Like I said, it's
sort of dogged me off and on many times.

Thanks.

Bill Moran wrote:

Omar Eljumaily <omar2@omnicode.com> wrote:

Sorry if this isn't exactly postgresql specific. I periodically
run into this problem, and I'm running into it now. I'm
wondering if there's something about "group by" that I don't
understand. As an example what I'd want to do is return the "id"
value for the check to each payee that has the highest amount.
It seems like there's no problem with ambiguity in logic, but
postgresql + other sql servers balk at it. The group by fields
need to explicitly match the select fields with the exception of
the aggregate function(s?).

create table checks
{
id serial,
payee text,
amount double
};

select max(amount), payee, id from checks group by payee;

Why won't the above work? Is there another way to get the id for
the record with the highest amount for each payee?

Because it's ambiguous. If you're grabbing max() for amount, which
id tuple do you want?

Perhaps the way you're storing your data, those answers aren't
ambiguous,
but the database doesn't know that. Take this query as an example:

select max(amount), max(checknumber), payee from checks group by
payee;

In that case, the highest checknumber and the highest check amount
probably won't come from the same tuple. If you were to throw in
there:

select max(amount), max(checknumber), payee, id from checks group
by payee;

Which id does it give you? The one that matches max(amount) or
the one
that matches max(checknumber)?

Omar, note that in many cases you can do:

select c.amount, c.payee, c.id
from checks c
where c.amount = (select max(amount)
from checks
where payee=c.payee);

erik jones <erik@myemma.com>
sofware developer
615-296-0838
emma(r)

Martijn van Oosterhout

kleptog@svana.org

about 19 years ago

In reply to: Omar Eljumaily (#3)

Re: Why does "group by" need to match select fields?

On Wed, Feb 28, 2007 at 04:57:08PM -0800, Omar Eljumaily wrote:

OK, I see what's going on. I can have more than one max(amount) with
the same amount and payee. Thanks so much. Like I said, it's sort of
dogged me off and on many times.

Note that in the special case of max/min, you can use things like ORDER
BY/LIMIT to acheive the effect you want. And DISTINCT ON () is also
useful in such situations.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Martijn van Oosterhout

kleptog@svana.org

about 19 years ago

In reply to: Omar Eljumaily (#3)

Re: Why does "group by" need to match select fields?

On Wed, Feb 28, 2007 at 04:57:08PM -0800, Omar Eljumaily wrote:

OK, I see what's going on. I can have more than one max(amount) with
the same amount and payee. Thanks so much. Like I said, it's sort of
dogged me off and on many times.

Note that in the case of min/max you can use ORDER BY/LIMIT or DISTINCT
ON () to acheive the effect you want.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

Bruno Wolff III

bruno@wolff.to

about 19 years ago

In reply to: Omar Eljumaily (#1)

Re: Why does "group by" need to match select fields?

On Wed, Feb 28, 2007 at 16:19:02 -0800,
Omar Eljumaily <omar2@omnicode.com> wrote:

select max(amount), payee, id from checks group by payee;

Why won't the above work? Is there another way to get the id for the
record with the highest amount for each payee?

While the DISTINCT ON approach is probably best if you can live with a
Postgres specific solution, the general way to do this is use the group by
query to get a set of primary keys with aggregates and then you join this
back to the original table to get the other data.

Some databases will also recognize that you are grouping by a candidate
key and allow you to specify normal columns since they must all have the
same value for rows with the same candidate key value. Unfortunately Postgres
doesn't do that now.