GROUP BY requirement

Started by Bill Moseleyover 20 years ago2 messagesgeneral
Jump to latest
#1Bill Moseley
moseley@hank.org

I'm wondering if adding a GROUP BY (as required by Postgres) will
change the results of a select on a view.

I have the following view which joins a "class" with a teacher. A
teacher is a "person" and I have an "instructors" link table.

CREATE VIEW class_list (id, class_time, instructor )
AS
SELECT DISTINCT ON(class.id)
class.id, class.class_time, person.first_name

FROM class, instructors, person
WHERE instructors.person = person.id
AND class.id = instructors.class;

I also have a table "registration" that links students with a class.
The registration table has a "reg_status" column to say if they are
confirmed or on the wait_list. So when showing the above I'd also
like to see how many students are confirmed and on the wait_list.

DROP VIEW cl;
CREATE VIEW cl (id, class_time, instructor,
confirmed_cnt, wait_list_cnt)
AS
SELECT DISTINCT ON(class.id)
class.id, class.class_time, person.first_name,
sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as confirmed_cnt,
sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as wait_list_cnt,

FROM class, instructors, person, registration
WHERE instructors.person = person.id
AND class.id = instructors.class
AND class.id = registration.class

GROUP BY class.id, class.class_time, person.first_name;

PostgreSQL requires the GROUP BY. But, I'm not clear how the GROUP BY
might change the results between the two views above.

http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY

says:

When GROUP BY is present, it is not valid for the SELECT list
expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to
return for an ungrouped column.

Frankly, I cannot see how it might change results of a select between
the two views. Am I missing something?

--
Bill Moseley
moseley@hank.org

#2Scott Marlowe
smarlowe@g2switchworks.com
In reply to: Bill Moseley (#1)
Re: GROUP BY requirement

On Fri, 2005-08-26 at 14:39, Bill Moseley wrote:

I'm wondering if adding a GROUP BY (as required by Postgres) will
change the results of a select on a view.

I have the following view which joins a "class" with a teacher. A
teacher is a "person" and I have an "instructors" link table.

CREATE VIEW class_list (id, class_time, instructor )
AS
SELECT DISTINCT ON(class.id)
class.id, class.class_time, person.first_name

FROM class, instructors, person
WHERE instructors.person = person.id
AND class.id = instructors.class;

I also have a table "registration" that links students with a class.
The registration table has a "reg_status" column to say if they are
confirmed or on the wait_list. So when showing the above I'd also
like to see how many students are confirmed and on the wait_list.

DROP VIEW cl;
CREATE VIEW cl (id, class_time, instructor,
confirmed_cnt, wait_list_cnt)
AS
SELECT DISTINCT ON(class.id)
class.id, class.class_time, person.first_name,
sum (CASE WHEN registration.reg_status = 1 THEN 1 ELSE 0 END) as confirmed_cnt,
sum (CASE WHEN registration.reg_status = 2 THEN 1 ELSE 0 END) as wait_list_cnt,

FROM class, instructors, person, registration
WHERE instructors.person = person.id
AND class.id = instructors.class
AND class.id = registration.class

GROUP BY class.id, class.class_time, person.first_name;

PostgreSQL requires the GROUP BY. But, I'm not clear how the GROUP BY
might change the results between the two views above.

http://www.postgresql.org/docs/8.0/static/sql-select.html#SQL-GROUPBY

says:

When GROUP BY is present, it is not valid for the SELECT list
expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to
return for an ungrouped column.

Frankly, I cannot see how it might change results of a select between
the two views. Am I missing something?

OK, distinct on suffers from this problem. Given the following simple
dataset:

mytable:
a | b
------
1 | 0
1 | 1

select distinct on (a) a,b from mytable;

One can see how the possible results are:

1,0 and 1,1, right? All depending on the order in which they are
fetched.

The same would be true if you could do a group by on a and select b:

select a,b from mytable group by a;

Right?

Now, if it's impossible for your dataset to return such sets, due to the
way it's built, it is likely not fully normalized. I.e. you have data
like this:

classid | instructorname | moreinfo...
--------------------------------------
1 | 'John Smith' | 'information'
1 | 'John Smith' | 'even more information'

and so on. Or your join is creating such a data set.