Why does aggregate query allow select of non-group by or aggregate values?

Started by Jack Christensenover 14 years ago5 messagesgeneral

Jack Christensen

jackc@hylesanderson.edu

over 14 years ago

CREATE TABLE people(
id serial PRIMARY KEY,
name varchar NOT NULL
);

INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'),
('Sam'), ('Joe'), ('Joe');

SELECT name, count(*), random()
FROM people
GROUP BY name;

I would expect this query to cause an error because of random(). I ran
into this using an array produced by a subquery as a column in the
select of an aggregate query, but I was able to boil it down to this
contrived example. Shouldn't any expression that is not in the group by
or an aggregate function be rejected?

What am I not understanding?

Thanks.

--
Jack Christensen
jackc@hylesanderson.edu

alonup8tb@gmail.com

over 14 years ago

In reply to: Jack Christensen (#1)

Re: Why does aggregate query allow select of non-group by or aggregate values?

On Fri, Dec 9, 2011 at 5:48 PM, Jack Christensen <jackc@hylesanderson.edu>wrote:

CREATE TABLE people(
id serial PRIMARY KEY,
name varchar NOT NULL
);

INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'),
('Sam'), ('Joe'), ('Joe');

SELECT name, count(*), random()
FROM people
GROUP BY name;

I would expect this query to cause an error because of random(). I ran
into this using an array produced by a subquery as a column in the select
of an aggregate query, but I was able to boil it down to this contrived
example. Shouldn't any expression that is not in the group by or an
aggregate function be rejected?

What am I not understanding?

Thanks.

--
Jack Christensen
jackc@hylesanderson.edu

I don't know the answer, but I would guess that as random() is not known
before hand , it has no reason being in the grouped by.

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Jack Christensen (#1)

Re: Why does aggregate query allow select of non-group by or aggregate values?

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Jack Christensen
Sent: Friday, December 09, 2011 5:48 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Why does aggregate query allow select of non-group by or
aggregate values?

CREATE TABLE people(
id serial PRIMARY KEY,
name varchar NOT NULL
);

INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'),
('Sam'), ('Joe'), ('Joe');

SELECT name, count(*), random()
FROM people
GROUP BY name;

I would expect this query to cause an error because of random(). I ran into
this using an array produced by a subquery as a column in the select of an
aggregate query, but I was able to boil it down to this contrived example.
Shouldn't any expression that is not in the group by or an aggregate
function be rejected?

What am I not understanding?

Thanks.

----------------------------------------------------------------------------
-------

Functions are evaluated once for each row that it generated by the
surrounding query. This is particularly useful if the function in question
takes an aggregate as an input:

SELECT col1, array_processing_function( ARRAY_AGG( col2 ) )
FROM table
GROUP BY col1;

Without this particular behavior you would need to sub-query.

From a layman's perspective the reason why you cannot use non-aggregates
outside of GROUP BY it that it is ambiguous as to what value to output; with
an uncorrelated function call that is not the case.

David J.

adrian.klaver@aklaver.com

over 14 years ago

In reply to: Jack Christensen (#1)

Re: Why does aggregate query allow select of non-group by or aggregate values?

On 12/09/2011 02:48 PM, Jack Christensen wrote:

CREATE TABLE people(
id serial PRIMARY KEY,
name varchar NOT NULL
);

INSERT INTO people(name) VALUES('Adam'), ('Adam'), ('Adam'), ('Bill'),
('Sam'), ('Joe'), ('Joe');

SELECT name, count(*), random()
FROM people
GROUP BY name;

I would expect this query to cause an error because of random(). I ran
into this using an array produced by a subquery as a column in the
select of an aggregate query, but I was able to boil it down to this
contrived example. Shouldn't any expression that is not in the group by
or an aggregate function be rejected?

What am I not understanding?

http://www.postgresql.org/docs/9.0/interactive/sql-select.html#SQL-GROUPBY
"
Aggregate functions, if any are used, are computed across all rows
making up each group, producing a separate value for each group (whereas
without GROUP BY, an aggregate produces a single value computed across
all the selected rows). When GROUP BY is present, it is not valid for
the SELECT list expressions to refer to ungrouped columns except within
aggregate functions, since there would be more than one possible value
to return for an ungrouped column."

My guess, random() does not refer to a column, so it falls outside the
above criteria.

Thanks.

--
Adrian Klaver
adrian.klaver@gmail.com

Jack Christensen

jackc@hylesanderson.edu

over 14 years ago

In reply to: David G. Johnston (#3)

Re: Why does aggregate query allow select of non-group by or aggregate values?

On 12/9/2011 4:57 PM, David Johnston wrote:

Functions are evaluated once for each row that it generated by the
surrounding query. This is particularly useful if the function in question
takes an aggregate as an input:

SELECT col1, array_processing_function( ARRAY_AGG( col2 ) )
FROM table
GROUP BY col1;

Without this particular behavior you would need to sub-query.

From a layman's perspective the reason why you cannot use non-aggregates

outside of GROUP BY it that it is ambiguous as to what value to output; with
an uncorrelated function call that is not the case.

David J.

Thanks. This makes sense now. I also went back to the original query
that provoked this question. It had a correlated subquery in the select
statement. I thought that this could yield ambiguous results. But when I
examined it closely, all the correlated fields were included in the
group by of the outer query, and when I tried to use a non-grouped
column from the outer query I correctly got a ERROR: subquery uses
ungrouped column "foo" from outer query

Thanks again.

--
Jack Christensen
jackc@hylesanderson.edu