Group By and wildcards...

Started by Jon Laphamabout 21 years ago11 messagesgeneral
Jump to latest
#1Jon Lapham
lapham@jandr.org

When using queries with aggregate functions, is there any way to not
have to have to explicitly write all the columns names after the GROUP
BY ? I would like to use a wildcard "*".

Imagine tables a, b, c, d each with hundreds of columns.

As an example, I would like to write:

SELECT a.*, b.*, c.*, SUM(d.blah)
FROM a, b, c, d
WHERE <some join conditions>
GROUP BY a.*, b.*, c.*

Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
explicit column names of all the column in a, b, and c.

This becomes a maintenance nightmare as you add/drop column in these
tables...

Thanks for any advice on how to handle this,
-Jon

PS: I'm using postgresql v7.4.x

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil
Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------

#2Bruno Wolff III
bruno@wolff.to
In reply to: Jon Lapham (#1)
Re: Group By and wildcards...

On Sat, Feb 19, 2005 at 12:07:12 -0200,
Jon Lapham <lapham@jandr.org> wrote:

When using queries with aggregate functions, is there any way to not
have to have to explicitly write all the columns names after the GROUP
BY ? I would like to use a wildcard "*".

Imagine tables a, b, c, d each with hundreds of columns.

As an example, I would like to write:

SELECT a.*, b.*, c.*, SUM(d.blah)
FROM a, b, c, d
WHERE <some join conditions>
GROUP BY a.*, b.*, c.*

Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
explicit column names of all the column in a, b, and c.

This becomes a maintenance nightmare as you add/drop column in these
tables...

Thanks for any advice on how to handle this,
-Jon

Don't those tables have primary keys? Grouping by the primay key of each
table will produce the same result set as grouping by all of the columns.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#2)
Re: Group By and wildcards...

Bruno Wolff III <bruno@wolff.to> writes:

Jon Lapham <lapham@jandr.org> wrote:

When using queries with aggregate functions, is there any way to not
have to have to explicitly write all the columns names after the GROUP
BY ? I would like to use a wildcard "*".

Don't those tables have primary keys? Grouping by the primay key of each
table will produce the same result set as grouping by all of the columns.

Unfortunately, PG will still make him GROUP BY everything he wants to
use as a non-aggregated output column. This behavior is per SQL92
spec. SQL99 added some verbiage to the effect that you only need to
GROUP BY columns that the rest are functionally dependent on (this
covers primary keys and some other cases); but we haven't got round
to implementing that extension.

regards, tom lane

#4Jon Lapham
lapham@jandr.org
In reply to: Tom Lane (#3)
Re: Group By and wildcards...

Tom Lane wrote:

Bruno Wolff III <bruno@wolff.to> writes:

Jon Lapham <lapham@jandr.org> wrote:

When using queries with aggregate functions, is there any way to not
have to have to explicitly write all the columns names after the GROUP
BY ? I would like to use a wildcard "*".

Don't those tables have primary keys? Grouping by the primay key of each
table will produce the same result set as grouping by all of the columns.

Bruno, this is true, but I want all the columns to appear in the output.

Unfortunately, PG will still make him GROUP BY everything he wants to
use as a non-aggregated output column. This behavior is per SQL92
spec. SQL99 added some verbiage to the effect that you only need to
GROUP BY columns that the rest are functionally dependent on (this
covers primary keys and some other cases); but we haven't got round
to implementing that extension.

Ugh.

Since I do not want to have to re-write all my aggregate function
containing queries upon modifications to the table definitions (and I do
not want to write multi-thousand character long SELECT statements),
maybe it is easier to use a temp table intermediary?

SELECT a.id AS aid, SUM(d.blah) AS sum_blah
INTO TEMPORARY TABLE foo
FROM a, b, c, d
WHERE <some join conditions linking a,b,c,d>

followed by

SELECT *
FROM a, b, c, foo
WHERE <some join conditions linking a,b,c>
AND foo.aid=a.id

Ugly... ugly... any other ideas on how to do this? My table definitions
LITERALLY have hundreds of columns, and I need access to them all.

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil
Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------

#5Bruno Wolff III
bruno@wolff.to
In reply to: Tom Lane (#3)
Re: Group By and wildcards...

On Sat, Feb 19, 2005 at 12:40:40 -0500,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bruno Wolff III <bruno@wolff.to> writes:

Jon Lapham <lapham@jandr.org> wrote:

When using queries with aggregate functions, is there any way to not
have to have to explicitly write all the columns names after the GROUP
BY ? I would like to use a wildcard "*".

Don't those tables have primary keys? Grouping by the primay key of each
table will produce the same result set as grouping by all of the columns.

Unfortunately, PG will still make him GROUP BY everything he wants to
use as a non-aggregated output column. This behavior is per SQL92
spec. SQL99 added some verbiage to the effect that you only need to
GROUP BY columns that the rest are functionally dependent on (this
covers primary keys and some other cases); but we haven't got round
to implementing that extension.

I forgot about that. However, if maintainance is the most important
consideration, then it is possible to use only the primary keys
to do the grouping and then join that result back to the original
tables to pick up the other columns. It should be possible to do
this without explicitly naming all of the columns. It will be slower
and more complicated, but this might be an acceptable trade off.

#6Russ Brown
pickscrape@gmail.com
In reply to: Jon Lapham (#4)
Re: Group By and wildcards...

Jon Lapham wrote:

Ugh.

Since I do not want to have to re-write all my aggregate function
containing queries upon modifications to the table definitions (and I do
not want to write multi-thousand character long SELECT statements),
maybe it is easier to use a temp table intermediary?

Ugly... ugly... any other ideas on how to do this? My table definitions
LITERALLY have hundreds of columns, and I need access to them all.

How about using a view? Create a view that contains only the ID columns,
and then a second view encorporating the first view which joins to the
appropriate tables and fetches the columns you want in the output of
your query.

Just a thought.

--

Russ.

#7Bruno Wolff III
bruno@wolff.to
In reply to: Jon Lapham (#4)
Re: Group By and wildcards...

On Sat, Feb 19, 2005 at 15:59:52 -0200,
Jon Lapham <lapham@jandr.org> wrote:

Since I do not want to have to re-write all my aggregate function
containing queries upon modifications to the table definitions (and I do
not want to write multi-thousand character long SELECT statements),
maybe it is easier to use a temp table intermediary?

SELECT a.id AS aid, SUM(d.blah) AS sum_blah
INTO TEMPORARY TABLE foo
FROM a, b, c, d
WHERE <some join conditions linking a,b,c,d>

followed by

SELECT *
FROM a, b, c, foo
WHERE <some join conditions linking a,b,c>
AND foo.aid=a.id

Ugly... ugly... any other ideas on how to do this? My table definitions
LITERALLY have hundreds of columns, and I need access to them all.

Well if you are thinking about the above than you might be interested in
seeing a more sketched out example of what I was suggesting in my
followup after Tom's correction.

SELECT a.*, b.*, c.*, e.d1
FROM a, b, c,
(SELECT a.id AS a1, b.id AS b1 , c.id AS c1, sum(d) AS d1
FROM a, b, c, d
WHERE <some join conditions linking a,b,c,d>
GROUP BY a1, b1, c1) AS e
WHERE
a.id = e.a1 AND
b.id = e.b1 AND
c.id = e.c1
;

#8Oisin Glynn
me@oisinglynn.com
In reply to: Jon Lapham (#1)
Re: Group By and wildcards...

This is a very NEWBIE suggestion. I am fully prepared to be laughed out of
town...

But the where clause defines the result of the aggregate function (in this
case the SUM)?

Is the only reason for needing the GROUP BY CLAUSE is because the aggregate
function demands it?

If so could something like the following work where we pass the where clause
conditions into the function and it performs the aggregate function and
returns.. I am guessing this would be extremely inefficient?

select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c,
where some conditions;

-- Warning complete gibberish pseudo code now follows

function my_cheating_sum(a.id,b.id,c.id)
select SUM(xxx) from a,b,c where some conditions;
end function;

----- Original Message -----
From: "Bruno Wolff III" <bruno@wolff.to>
To: "Jon Lapham" <lapham@jandr.org>
Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; <pgsql-general@postgresql.org>
Sent: Saturday, February 19, 2005 13:36
Subject: Re: [GENERAL] Group By and wildcards...

Show quoted text

On Sat, Feb 19, 2005 at 15:59:52 -0200,
Jon Lapham <lapham@jandr.org> wrote:

Since I do not want to have to re-write all my aggregate function
containing queries upon modifications to the table definitions (and I do
not want to write multi-thousand character long SELECT statements),
maybe it is easier to use a temp table intermediary?

SELECT a.id AS aid, SUM(d.blah) AS sum_blah
INTO TEMPORARY TABLE foo
FROM a, b, c, d
WHERE <some join conditions linking a,b,c,d>

followed by

SELECT *
FROM a, b, c, foo
WHERE <some join conditions linking a,b,c>
AND foo.aid=a.id

Ugly... ugly... any other ideas on how to do this? My table definitions
LITERALLY have hundreds of columns, and I need access to them all.

Well if you are thinking about the above than you might be interested in
seeing a more sketched out example of what I was suggesting in my
followup after Tom's correction.

SELECT a.*, b.*, c.*, e.d1
FROM a, b, c,
(SELECT a.id AS a1, b.id AS b1 , c.id AS c1, sum(d) AS d1
FROM a, b, c, d
WHERE <some join conditions linking a,b,c,d>
GROUP BY a1, b1, c1) AS e
WHERE
a.id = e.a1 AND
b.id = e.b1 AND
c.id = e.c1
;

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#9Bruce Momjian
bruce@momjian.us
In reply to: Bruno Wolff III (#2)
Re: Group By and wildcards...

Bruno Wolff III <bruno@wolff.to> writes:

On Sat, Feb 19, 2005 at 12:07:12 -0200,
Jon Lapham <lapham@jandr.org> wrote:

SELECT a.*, b.*, c.*, SUM(d.blah)
FROM a, b, c, d
WHERE <some join conditions>
GROUP BY a.*, b.*, c.*

Instead of having to expand the "GROUP BY a.*, b.*, c.*" using the
explicit column names of all the column in a, b, and c.

This becomes a maintenance nightmare as you add/drop column in these
tables...

Don't those tables have primary keys? Grouping by the primay key of each
table will produce the same result set as grouping by all of the columns.

Actually it would be kind of nice to have this as a feature. Or mysql's
feature of treating any unnamed columns as something like DISTINCT ON.

However there are a few approaches for dealing with it. None of which are
perfect but if they match your needs they work well.

In the query above you could turn SUM(d.blah) into a subquery expression. This
works well as long as you don't have multiple aggregate queries on the same
table.

SELECT a.*,b.*,c.*,
(SELECT sum(blah) FROM d WHERE ...) AS d_sum
FROM a,b,c

This doesn't require a GROUP BY step which means it'll probably be faster. On
the other hand it effectively forces a nested loop scan on d which is not
necessarily the fastest. And if you have multiple aggregates postgres it
forces separate lookups for the same data. It would be nice to have some
feature for breaking out subquery expressions that return multiple rows into
multiple output columns. Something like:

SELECT a.*,b.*,c.*,
(SELECT sum(blah),avg(blah) FROM d WHERE ...) AS (d_sum,d_avg)
FROM a,b,c

You could also turn the above into a more complex join like:

SELECT *
FROM a,b,c,
(SELECT groupname, SUM(blah) as d_sum FROM d GROUP BY groupname) AS d
WHERE ...
AND c.groupname = d.groupname

This works well as long as you didn't have the aggregate function applying to
overlapping subsets of d before. (eg, it won't work for sum(product.price) if
multiple invoices can contain the same product).

alternatively you can do something like

SELECT *
FROM a,b,c,
(select a.id as a_id, b.id as b_id, c.id as c_id,
sum(blah) as d_sum
from a,b,c,d
where ...
group by a.id,b.id,c.id
) AS sub
WHERE a.id = a_id
AND b.id = b_id
AND c.id = c_id

But that's pretty silly and not usually necessary.

--
greg

#10Bruno Wolff III
bruno@wolff.to
In reply to: Oisin Glynn (#8)
Re: Group By and wildcards...

On Sat, Feb 19, 2005 at 14:02:34 -0500,
Oisin Glynn <me@oisinglynn.com> wrote:

But the where clause defines the result of the aggregate function (in this
case the SUM)?

Not really.

Is the only reason for needing the GROUP BY CLAUSE is because the aggregate
function demands it?

Note that there is also a join to a table d. So that values in d are
being summed up based on some connection from d to the other 3 tables.

If so could something like the following work where we pass the where clause
conditions into the function and it performs the aggregate function and
returns.. I am guessing this would be extremely inefficient?

select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c,
where some conditions;

If that function did a select from d, you could make this work, but it
would likely be much slower than doing it in one SQL statement.

#11Sim Zacks
sim@nospam.com
In reply to: Jon Lapham (#1)
Re: Group By and wildcards...

Even if the function did a select from d, it could still have plenty of
duplicates. To remove that possibility you would have to use the distinct
clause which is also generally less efficient then a group by.

"Bruno Wolff III" <bruno@wolff.to> wrote in message
news:20050219192033.GA24244@wolff.to...

On Sat, Feb 19, 2005 at 14:02:34 -0500,
Oisin Glynn <me@oisinglynn.com> wrote:

But the where clause defines the result of the aggregate function (in

this

case the SUM)?

Not really.

Is the only reason for needing the GROUP BY CLAUSE is because the

aggregate

function demands it?

Note that there is also a join to a table d. So that values in d are
being summed up based on some connection from d to the other 3 tables.

If so could something like the following work where we pass the where

clause

Show quoted text

conditions into the function and it performs the aggregate function and
returns.. I am guessing this would be extremely inefficient?

select A.*,B.*,C.*,my_cheating_sum(a.id,b.id,c.id) from a,b,c,
where some conditions;

If that function did a select from d, you could make this work, but it
would likely be much slower than doing it in one SQL statement.

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings