Improving GROUP BY?

Started by Oliver Siegmarover 20 years ago5 messagesgeneral
Jump to latest
#1Oliver Siegmar
oliver@siegmar.net

Hello,

as you all may know, the GROUP BY clause in PostgreSQL has to be parameterized
with all fields that appear in the SELECT clause (except aggregates).

So

SELECT id, foo, sum(bar)
FROM table1
GROUP BY id, foo;

is perfectly valid, where

SELECT id, foo, sum(bar)
FROM table1
GROUP BY id;

would fail with something like '"foo" must appear in the GROUP BY clause or be
used in an aggregate function'.

I understand that behaviour in the case that the 'id' field wouldn't be a
primary (or unique) identifier of the table.

So my questions are

- Would it be possible to add a feature to PostgreSQL to allow grouping by
unique identifiers only?
- Is this requirement (all fields in GROUP BY) based on the SQL standard? If
yes, by which version?

Best

Oliver

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Oliver Siegmar (#1)
Re: Improving GROUP BY?

Am Freitag, 30. September 2005 09:03 schrieb Oliver Siegmar:

- Would it be possible to add a feature to PostgreSQL to allow grouping by
unique identifiers only?
- Is this requirement (all fields in GROUP BY) based on the SQL standard?

What you describe is called "functional dependencies" in the SQL standard.
(The query processor would recognize that the non-primary key columns
"functionally depend" on the primary key and therefore do not have to be
grouped separately.) This is definitely a valid feature but PostgreSQL does
not implement it yet.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#3John Seberg
johnseberg@yahoo.com
In reply to: Oliver Siegmar (#1)
Re: Improving GROUP BY?
--- Oliver Siegmar <oliver@siegmar.net> wrote:

Hello,

as you all may know, the GROUP BY clause in
PostgreSQL has to be parameterized
with all fields that appear in the SELECT clause
(except aggregates).

So

SELECT id, foo, sum(bar)
FROM table1
GROUP BY id, foo;

is perfectly valid, where

SELECT id, foo, sum(bar)
FROM table1
GROUP BY id;

would fail with something like '"foo" must appear in
the GROUP BY clause or be
used in an aggregate function'.

I understand that behaviour in the case that the
'id' field wouldn't be a
primary (or unique) identifier of the table.

So my questions are

- Would it be possible to add a feature to
PostgreSQL to allow grouping by
unique identifiers only?

There would seem to be no point in grouping by unique
ids - by definition, there would only be one member in
each group.

- Is this requirement (all fields in GROUP BY) based
on the SQL standard? If
yes, by which version?

I don't know about standards, but, this behavior is
common among several vendors I have used. If you want
to see one of the values of foo, in your example, use
min() or max(). Just be mindful that other values of
foo may exist in the grouping.

So, I fail to see your "improvement". I like the way
GROUP BY works just fine.

__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

#4Oliver Siegmar
oliver@siegmar.net
In reply to: Peter Eisentraut (#2)
Re: Improving GROUP BY?

On Friday 30 September 2005 15:23, Peter Eisentraut wrote:

Am Freitag, 30. September 2005 09:03 schrieb Oliver Siegmar:

- Would it be possible to add a feature to PostgreSQL to allow grouping
by unique identifiers only?
- Is this requirement (all fields in GROUP BY) based on the SQL standard?

What you describe is called "functional dependencies" in the SQL standard.
(The query processor would recognize that the non-primary key columns
"functionally depend" on the primary key and therefore do not have to be
grouped separately.) This is definitely a valid feature but PostgreSQL
does not implement it yet.

Could you add it to the TODO list? ;-)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#2)
Re: Improving GROUP BY?

Peter Eisentraut <peter_e@gmx.net> writes:

Am Freitag, 30. September 2005 09:03 schrieb Oliver Siegmar:

- Is this requirement (all fields in GROUP BY) based on the SQL standard?

What you describe is called "functional dependencies" in the SQL standard.
(The query processor would recognize that the non-primary key columns
"functionally depend" on the primary key and therefore do not have to be
grouped separately.) This is definitely a valid feature but PostgreSQL does
not implement it yet.

Note also that the "functional dependencies" stuff was added in SQL99.
Postgres' current behavior corresponds to the SQL92 standard.

regards, tom lane