SQL feature requests
Hopefully this is the right place for a few feature requests that
would address some of the things that I've noticed in postgres.
1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias? For instance suppose that I have an orders table, and
one of the fields is userid. The following is unambiguous and is
legal in Oracle:
SELECT order_count
, count(*) as people
FROM (
SELECT count(*) as order_count
FROM orders
GROUP BY userid
)
GROUP BY order_count
It annoys me that it isn't legal in postgres. (Yes, I know how to fix
the query. But it still is an annoyance, and it comes up fairly often
in reporting purposes.)
2. Why is 'non-integer constant in GROUP BY' an error?
I find it inconvenient. For reporting purposes I often have to
dynamically build queries in code. An easy way to do that is just
interpolate in a set of possible statements which will either be empty
strings or have trailing commas. But then I need this (possibly
empty) list to have a valid group by statement at the end. In Oracle
I used to just write it like this:
SELECT ...
GROUP BY $possible_term_1
$possible_term_2
$possible_term_3
'end of possible groupings'
In postgres I either have to use a different strategy to build up
these strings, or else use a more complicated term to finish that off.
This works for now:
case when true then true end
but I don't know whether some future version of postgres might break
my code by banning that as well.
3. How hard would it be to have postgres ignore aliases in group by
clauses? Per my comments above, I often build complex queries in
code. I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine. So my
code has to copy the select terms. But I can't copy them exactly
because the select terms include lots of "...as foo" clauses that are
not allowed in a group by. So I have to store very similar terms to
use twice.
It would be nice if I could just make the group by look like the
select, and have the (obviously irrelevant) aliases just be ignored.
4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like. Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.
For example
SELECT foo, count(*)
FROM bar
would be processed as:
SELECT foo, count(*)
FROM bar
GROUP BY foo
If I write a query with an aggregate function in the select, better
than 95% of the time this is the group by clause that I want. (This
email has one of the few exceptions.) In the remaining cases I could
easily add the extra stuff in the group by to the select without
problems. Therefore if postgres could just insert the obvious group
by clause in, I would never again write the words "group by" when
working with postgres. And I predict that many other people would do
the same.
But it doesn't. So when working with postgres, just like every other
database that I've used, I have to constantly type in group by clauses
with entirely redundant information. (But they're not EXACTLY the
same as the select clauses that they are redundant with...)
Cheers,
Ben
Ben,
pgsql-sql is probably the appropriate list for future queries of this
nature.
Note that the below is my personal opinion; each PG developer has their
own.
1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias? For instance suppose that I have an orders table, and
one of the fields is userid. The following is unambiguous and is
legal in Oracle:
I *think* the alias is a requirement of the SQL standard. Yes/No?
2. Why is 'non-integer constant in GROUP BY' an error?
Again, this needs to reference one of the SQL standards if you're
interested in a change of behavior. If we're out of standards compliance,
that's a strong argument. If we're in compliance, you have a pretty steep
hurdle to justify new syntax.
3. How hard would it be to have postgres ignore aliases in group by
clauses?
Unfortunately, I think this is also a SQL compliance issue. However, I'd
be more liable to support your arguments for it; it's much more obviously
useful functionality.
4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like. Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.
In addition to SQL compliance issues, we're reluctant to do anything which
makes implicit assumptions which could easily be wrong in PostgreSQL.
Such shortcutting all to often leads to runaway queries or wrong data when
the assumptions are incorrect. MySQL gives us lots of examples of what
can happen if you do too many things for convenience and compromise
correctness.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?
It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
does?
2. Why is 'non-integer constant in GROUP BY' an error?
This works for now:
case when true then true end
but I don't know whether some future version of postgres might break
my code by banning that as well.
The PostgreSQL developers generally tries hard to preserve backwards
compatibility, so I doubt the case expression as you have it would go
away (though I'm kind of surprised it's allowed). Am I wrong in
thinking that Oracle would accept the same format PostgreSQL does? In
that case, couldn't you use whatever method works in PostgreSQL in
Oracle? I haven't checked the SQL standard, but it seems unlikely
it'd allow something like
GROUP BY , , , ;
AIUI, Integers are only allowed because the SQL standard explicitly
allows you to refer to columns by the order they appear in the SELECT
list. Otherwise the GROUP BY items need to be column names.
Both 1 and 2 seem to me to be places where Oracle is likely deviating
from the standard. If you're targeting Oracle, then using Oracle-
specific syntax might be warranted. If you're hoping to target more
than one possible backend, I'd think it be better to use more
portable syntax (e.g., SQL-standard syntax) than expecting other
DBMSs to follow another's deviations. That's not to say PostgreSQL
does not have non-standard syntax: in places, it does. But it does
try to hew very closely to the standard.
Again, I wonder what EnterpriseDB does in this case?
3. How hard would it be to have postgres ignore aliases in group by
clauses? Per my comments above, I often build complex queries in
code. I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine. So my
code has to copy the select terms. But I can't copy them exactly
because the select terms include lots of "...as foo" clauses that are
not allowed in a group by. So I have to store very similar terms to
use twice.
Perhaps someone else knows what you're referring to here, but I'm
having a hard time without an example. Here's what I *think* you're
trying to say:
test=# select * from observation;
observation_id | record_id | score_id
----------------+-----------+----------
3240 | 1 | 1
3239 | 1 | 1
3238 | 1 | 2
3237 | 1 | 1
2872 | 2 | 1
2869 | 2 | 2
2870 | 2 | 1
2871 | 2 | 1
3218 | 3 | 2
3217 | 3 | 1
(10 rows)
test=# select record_id as foo, count(observation_id) from
observation group by record_id;
foo | count
-----+-------
3 | 2
2 | 4
1 | 4
(3 rows)
test=# select record_id as foo, count(observation_id) from
observation group by foo;
foo | count
-----+-------
3 | 2
2 | 4
1 | 4
(3 rows)
test=# select record_id as foo, count(observation_id) as bar from
observation group by foo;
foo | bar
-----+-----
3 | 2
2 | 4
1 | 4
(3 rows)
test=# select record_id as foo, count(observation_id) as bar from
observation group by record_id;
foo | bar
-----+-----
3 | 2
2 | 4
1 | 4
(3 rows)
test=# select version();
version
------------------------------------------------------------------------
----------------------------------------------------------------------
PostgreSQL 8.2.4 on powerpc-apple-darwin8.9.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.
build 5367)
(1 row)
I'm not getting an error in any permutation that I can think of. What
am I missing?
Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.For example
SELECT foo, count(*)
FROM barwould be processed as:
SELECT foo, count(*)
FROM bar
GROUP BY foo
It's been discussed before. I don't believe it's been rejected out of
hand (though you can check the archives), just that no one's gotten
around to it. (Don't know what the SQL-spec says on this point.)
I'm not trying to dismiss your points, just trying to address them.
I'm interested to hear what others have to say.
Michael Glaesemann
grzm seespotcode net
"Ben Tilly" <btilly@gmail.com> writes:
Hopefully this is the right place for a few feature requests that
would address some of the things that I've noticed in postgres.1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias? For instance suppose that I have an orders table, and
one of the fields is userid. The following is unambiguous and is
legal in Oracle:
Thank you, this is one of my top pet peeves but when I proposed changing it I
was told nobody's complained. Now we have at least one user complaint, any
others out there?
2. Why is 'non-integer constant in GROUP BY' an error?
Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.
3. How hard would it be to have postgres ignore aliases in group by
clauses?
That sounds like a strange idea.
4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like. Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.For example
SELECT foo, count(*)
FROM barwould be processed as:
SELECT foo, count(*)
FROM bar
GROUP BY foo
I agree this would be convenient but it seems too scary to actually go
anywhere. What would you group by in the case of:
SELECT a+b, count(*) FROM bar
Should it group by a,b or a+b ?
Also, this might be a bit shocking for MySQL users who are accustomed to
MySQL's non-standard extension for the same syntax. There it's treated as an
assertion that the columns are equal for all records in a group or at least
that it doesn't matter which such value is returned, effectively equivalent to
our DISTINCT ON feature.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
does?
I can well believe that the standard says that you must accept
subqueries with aliases. But does it say you must reject subqueries
without aliases? I strongly doubt that.
I have no clue what EnterpriseDB does.
2. Why is 'non-integer constant in GROUP BY' an error?
This works for now:
case when true then true end
but I don't know whether some future version of postgres might break
my code by banning that as well.The PostgreSQL developers generally tries hard to preserve backwards
compatibility, so I doubt the case expression as you have it would go
away (though I'm kind of surprised it's allowed). Am I wrong in
thinking that Oracle would accept the same format PostgreSQL does? In
The reason for my comparing to Oracle is that I used to work at an
Oracle shop. I now work at a postgres shop. Portability is not my
issue, just the annoyances that I experienced moving from one to the
other.
As for whether that case expression would go away, that it is allowed
is such an obscure feature that I doubt anyone changing that code
would notice if it was removed.
that case, couldn't you use whatever method works in PostgreSQL in
Oracle? I haven't checked the SQL standard, but it seems unlikely
it'd allow something likeGROUP BY , , , ;
That's not what Oracle accepts that postgres does not. What Oracle accepts is:
...
GROUP BY 'foo';
AIUI, Integers are only allowed because the SQL standard explicitly
allows you to refer to columns by the order they appear in the SELECT
list. Otherwise the GROUP BY items need to be column names.
Need to be?
The SQL-92 standard is clear that you must accept a list of column
names. It is also clear that a column name must be be of the form
field or table.field. In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.
But every reasonable database that I know - including postgres - allows that.
The standard very wisely does not forbid extensions. Every database
has extensions. In some cases, such as allowing trim(foo.bar) in a
group by clause, some extensions are so common as to be a standard.
(I don't have a copy of any later standards so I don't know whether
that has since been explicitly allowed.) Therefore the real question
is how much farther than the standard you go.
Postgres explicitly disallows a constant character expression. But it
allows the constant case expression that I gave. It would be nice for
me to not have to remember that very obscure and convoluted case.
Both 1 and 2 seem to me to be places where Oracle is likely deviating
from the standard. If you're targeting Oracle, then using Oracle-
specific syntax might be warranted. If you're hoping to target more
than one possible backend, I'd think it be better to use more
portable syntax (e.g., SQL-standard syntax) than expecting other
DBMSs to follow another's deviations. That's not to say PostgreSQL
does not have non-standard syntax: in places, it does. But it does
try to hew very closely to the standard.
The queries that I'm writing are not hoping to target more than one
database at one company.
Again, I wonder what EnterpriseDB does in this case?
No clue.
3. How hard would it be to have postgres ignore aliases in group by
clauses? Per my comments above, I often build complex queries in
code. I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine. So my
code has to copy the select terms. But I can't copy them exactly
because the select terms include lots of "...as foo" clauses that are
not allowed in a group by. So I have to store very similar terms to
use twice.Perhaps someone else knows what you're referring to here, but I'm
having a hard time without an example. Here's what I *think* you're
trying to say:
What I'm trying to say is that it would be convenient for me to be
able to write:
select bar as "baz"
, count(*) as "some count"
from foo
group by bar as "baz"
That's not allowed right now because as is not allowed in a group by statement.
[...]
Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.For example
SELECT foo, count(*)
FROM barwould be processed as:
SELECT foo, count(*)
FROM bar
GROUP BY fooIt's been discussed before. I don't believe it's been rejected out of
hand (though you can check the archives), just that no one's gotten
around to it. (Don't know what the SQL-spec says on this point.)
I don't know what the SQL spec says, but I know (having talked to
other developers) that many people would find it very nice.
I'm not trying to dismiss your points, just trying to address them.
I'm interested to hear what others have to say.
Cheers,
Ben
On 8/22/07, Gregory Stark <stark@enterprisedb.com> wrote:
"Ben Tilly" <btilly@gmail.com> writes:
Hopefully this is the right place for a few feature requests that
would address some of the things that I've noticed in postgres.1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias? For instance suppose that I have an orders table, and
one of the fields is userid. The following is unambiguous and is
legal in Oracle:Thank you, this is one of my top pet peeves but when I proposed changing it I
was told nobody's complained. Now we have at least one user complaint, any
others out there?
Always happy to complain. :-)
2. Why is 'non-integer constant in GROUP BY' an error?
Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.
If your implementation accepts:
group by case when true then 'foo' end
how much harder can it be to accept:
group by 'foo'
?
3. How hard would it be to have postgres ignore aliases in group by
clauses?That sounds like a strange idea.
It is a strange idea, but it makes dynamically building queries
easier. Right now I'm following a strategy of storing what I'm going
to insert in the select clause in one variable, and the group by
clause in another. So I need 2 variables for each dynamic field that
I might choose to group by and want to have a custom name for. With
this change I would only need one variable.
4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like. Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.For example
SELECT foo, count(*)
FROM barwould be processed as:
SELECT foo, count(*)
FROM bar
GROUP BY fooI agree this would be convenient but it seems too scary to actually go
anywhere. What would you group by in the case of:SELECT a+b, count(*) FROM bar
Should it group by a,b or a+b ?
It should group by a+b. Which is to say, every field in the select
clause that currently triggers an error because it isn't in the group
by clause.
Also, this might be a bit shocking for MySQL users who are accustomed to
MySQL's non-standard extension for the same syntax. There it's treated as an
assertion that the columns are equal for all records in a group or at least
that it doesn't matter which such value is returned, effectively equivalent to
our DISTINCT ON feature.
I don't mind shocking MySQL users. ;-)
But seriously, if that objection is the barrier then I'd be happy to
see it be something that is explicitly turned on in the query. For
instance:
select autogroup bar, count(*) from foo
If that was available then I for one would type autogroup a lot more
often than group by. After all autogroup is about as hard to type,
and I don't have to type the redundant list of fields in the group by.
Cheers,
Ben
Ben Tilly wrote:
On 8/22/07, Gregory Stark <stark@enterprisedb.com> wrote:
"Ben Tilly" <btilly@gmail.com> writes:
2. Why is 'non-integer constant in GROUP BY' an error?
Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.If your implementation accepts:
group by case when true then 'foo' end
how much harder can it be to accept:
group by 'foo'
This is not about hardness of the implementation, but rather about
non-confusing behaviour I think.
AFAIK, "group by 1" means "group by the first selected column", not
"group all rows together". But "group by 'foo'" would carry the second
meaning - "group all rows together". This is so totally counter-intuitive,
that it's not even funny...
"group by case when true then 'foo' end" looks different enough compared to
"group by 1" to make this less of a footgun.
Seems that the "group by <integer>" syntax predates the appearance of
aliases in the sql standard...
greetings, Florian flug
On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
does?I can well believe that the standard says that you must accept
subqueries with aliases. But does it say you must reject subqueries
without aliases? I strongly doubt that.
If I'm reading my draft copy of the SQL:2003 spec right (and there's
a good chance that I'm not, as it's not the easiest document for me
to parse), aliases *are* required.
From 5WD-02-Foundation-2003-09 (the fifth working draft)
7.5 <from clause>
<from clause> ::= FROM <table reference list>
<table reference list> ::=
<table reference> [ { <comma> <table reference> }... ]
7.6 <table reference>
<table reference> ::=
<table factor>
| <joined table>
<table factor> ::= <table primary> [ <sample clause> ]
<table primary> ::=
<table or query name> [ [ AS ] <correlation name>
[ <left paren> <derived column list><right paren> ] ]
| <derived table> [ AS ] <correlation name>
[ <left paren> <derived column list><right paren> ]
| <lateral derived table> [ AS ] <correlation name>
[ <left paren> <derived column list><right paren> ]
| <collection derived table> [ AS ] <correlation name>
[ <left paren> <derived column list><right paren> ]
| <table function derived table> [ AS ] <correlation name>
[ <left paren> <derived column list><right paren> ]
| <only spec> [ [ AS ] <correlation name>
[ <left paren> <derived column list><right paren> ] ]
| <left paren> <joined table> <right paren>
<derived table> ::= <table subquery>
<correlation name> I believe is what we commonly refer to as an
alias. I think <table or query name> is a table or view name, and
<derived table> is a subquery. <correlation name> is optional for
<table or query name> but not for <derived table>. The fact that the
standard clearly makes it optional in some cases and required in
others is pretty clear that <derived table> without a <correlation
name> is to be rejected, don't you think?
I have no clue what EnterpriseDB does.
In case it wasn't clear, the reason I bring it up is that
EnterpriseDB, while working from a PostgreSQL base, strives for
Oracle compatibility.
that case, couldn't you use whatever method works in PostgreSQL in
Oracle? I haven't checked the SQL standard, but it seems unlikely
it'd allow something likeGROUP BY , , , ;
That's not what Oracle accepts that postgres does not. What
Oracle accepts is:...
GROUP BY 'foo';
Thanks for clarifying. If it wasn't obvious by now, you know I'm not
familiar with Oracle :)
AIUI, Integers are only allowed because the SQL standard explicitly
allows you to refer to columns by the order they appear in the SELECT
list. Otherwise the GROUP BY items need to be column names.Need to be?
The SQL-92 standard is clear that you must accept a list of column
names. It is also clear that a column name must be be of the form
field or table.field.
The 2003 draft (same as above) seems to agree with the SQL92 standard:
7.9 <group by clause>
Format
<group by clause> ::=
GROUPBY [ <set quantifier> ] <grouping element list>
<grouping element list> ::=
<grouping element> [ { <comma> <grouping element> }... ]
<grouping element> ::=
<ordinary grouping set>
| <rollup list>
| <cube list>
| <grouping sets specification>
| <empty grouping set>
<ordinary grouping set> ::=
<grouping column reference>
| <left paren> <grouping column reference list><right paren>
<grouping column reference> ::=
<column reference> [ <collate clause> ]
<grouping column reference list> ::=
<grouping column reference> [ { <comma><grouping column
reference> }... ]
<rollup list> ::=
ROLLUP<left paren> <ordinary grouping set list><right paren>
<ordinary grouping set list> ::=
<ordinary grouping set> [ { <comma> <ordinary grouping set> }... ]
<cube list> ::=
CUBE<left paren> <ordinary grouping set list><right paren>
<grouping sets specification> ::=
GROUPINGSETS <left paren> <grouping set list><right paren>
<grouping set list> ::=
<grouping set> [ { <comma> <grouping set> }... ]
<grouping set> ::=
<ordinary grouping set>
| <rollup list>
| <cube list>
| <grouping sets specification>
| <empty grouping set>
<empty grouping set> ::= <left paren><right paren>
6.7 <column reference>
Format
<column reference> ::=
<basic identifier chain>
| MODULE<period> <qualified identifier><period> <column name>
There'd have to be a pretty strong reason to extend this, more than
just a convenience, I should think.
In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.But every reasonable database that I know - including postgres -
allows that.
Can you give an example of something like this working in PostgreSQL?
I get an error when I try to use a text value in a GROUP BY clause.
(Or are you referring specifically to the CASE expression corner case?)
test=# select record_id as foo, count(observation_id) as bar from
observation group by trim(observation.record_id::text);
ERROR: column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
test=# select record_id as foo, count(observation_id) as bar from
observation group by trim(observation.record_id);
ERROR: column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
Postgres explicitly disallows a constant character expression. But it
allows the constant case expression that I gave. It would be nice for
me to not have to remember that very obscure and convoluted case.
I agree, and would move that it should be disallowed if there isn't a
reason for it to be maintained, for exactly the reason you give:
there shouldn't be such convoluted, and obscure corner case.
test=# select record_id as foo, count(observation_id) as bar from
observation group by record_id, true;
ERROR: non-integer constant in GROUP BY
test=# select record_id as foo, count(observation_id) as bar from
observation group by record_id, case when true then true end;
What I'm trying to say is that it would be convenient for me to be
able to write:select bar as "baz"
, count(*) as "some count"
from foo
group by bar as "baz"That's not allowed right now because as is not allowed in a group
by statement.
Ah, I see. Thanks for clarifying.
Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you
group by
all non-aggregate functions in the select.For example
SELECT foo, count(*)
FROM barwould be processed as:
SELECT foo, count(*)
FROM bar
GROUP BY fooIt's been discussed before. I don't believe it's been rejected out of
hand (though you can check the archives), just that no one's gotten
around to it. (Don't know what the SQL-spec says on this point.)I don't know what the SQL spec says, but I know (having talked to
other developers) that many people would find it very nice.
Since I had the spec open, I tried to look at this as well, though I
must admit I found it very tough going.
I think this is the key section:
10.9 <aggregate function>
...
Syntax Rules
...
4) The argument source of an <aggregate function> is
Case:
a) If AF is immediately contained in a <set function
specification>, then a table or group of a grouped table as
specified in Subclause 7.10, “<having clause>”, and Subclause 7.12,
“<query specification>”.b) Otherwise, the collection of rows in the current row's window
frame defined by the window structure descriptor identified by the
<window function> that simply contains AF, as defined in Subclause
7.11, “<window clause>”.
Now the <query specification> stuff is pretty dense, and the <window
clause> is currently impenetrable for me, so I just looked at the
<having clause> and <query specification> sections referenced above.
7.10 <having clause>
...
Syntax Rules
1) Let HC be the <having clause>. Let TE be the <table expression>
that immediately contains HC. If TE does not immediately contain a
<group by clause>, then “GROUPBY ()” is implicit. Let T be the
descriptor of the table defined by the <group by clause>GBC
immediately contained in TE and let R be the result of GBC.
So it looks like if there's no explicit GROUP BY, we're to assume
GROUP BY (), at least in the case of HAVING.
7.12 <query specification>
...
Syntax Rules
...
14) If <table expression> does not immediately contain a <group by
clause> and <table expression> is simply contained in a <query
expression> that is the aggregation query of some <set function
specification>, then GROUP BY () is implicit.
Again, assume GROUP BY () if no explicit GROUP BY with a <set
function specification>, which I understand to be some form of
aggregation.
15) If T is a grouped table, then let G be the set of grouping
columns of T. In each <value expression> contained in <select
list> , each column reference that references a column of T shall
reference some column C that is functionally dependent on G or
shall be contained in an aggregated argument of a <set function
specification> whose aggregation query is QS.
And a definition of grouped table is found:
4.14.2 Types of tables
...
A grouped table is a set of groups derived during the evaluation of
a <group by clause>. A group G is a collection of rows in which,
for every grouping column GC, if the value of GC in some row is not
distinct from GV, then the value of GC in every row is GV;
moreover, if R1 is a row in group G1 of grouped table GT and R2 is
a row in GT such that for every grouping column GC the value of GC
in R1 is not distinct from the value of GC in R2, then R2 is in G1.
Every row in GT is in exactly one group. A group may be considered
as a table. Set functions operate on groups.
So we're already talking about an explicit GROUP BY clause, and it
doesn't seem to leave much wiggle room grouping columns that aren't
explicitly listed. I'll readily admit I very easily could be reading
this wrong or have missed a section of the spec that discusses this
in more detail. (And that goes for any of my interpretations of the
spec!)
One place you and I differ is that I think you'd be in favor of many
more extensions in cases where the SQL spec doesn't explicitly say
"Thou shalt not do X". I'm a bit more conservative here: in my
opinion, if the SQL speaks to a particular issue (for example, a
WHERE clause), then you should pretty much keep to the letter of what
it says. Extensions are more for areas where the standard is silent.
For example, full text search will be added to PostgreSQL in 8.3.
AFAIK, the SQL spec doesn't talk about full text search at all, so
this is an area where extensions are a natural fit. Others, such as
PostgreSQL's DISTINCT ON, is a much more difficult issue, and I'd
suspect if it were proposed as a new feature now it would be
rejected. However, it's been a part of PostgreSQL for a long time, so
for historical reasons it remains. Likewise, I doubt "text" as a
synonym for "varchar" would be accepted.
Anyway, enough excursions into SpecLand for me. I'd welcome anyone
else taking a crack at it.
Michael Glaesemann
grzm seespotcode net
On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
If your implementation accepts:
group by case when true then 'foo' end
What would that mean? Regardless of whether or not it's accepted, it
should have *some* meaning.
It's not equivalent to GROUP BY "foo"
test=# select record_id as foo
, count(observation_id) as bar
from observation
group by case when true
then 'foo'
end;
ERROR: column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
test=# select record_id
, count(observation_id) as bar
from observation
group by case when true
then 'record_id'
end;
ERROR: column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
*This* seems like a bug:
test=# select record_id
, count(observation_id) as bar
from observation
group by record_id
, case when true
then 'foo'
end;
record_id | bar
-----------+-----
1 | 4
2 | 4
3 | 2
(3 rows)
And for good measure:
test=# select record_id
, count(observation_id) as bar
from observation
group by case when true
then record_id
end;
ERROR: column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
Michael Glaesemann
grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> writes:
On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
I can well believe that the standard says that you must accept
subqueries with aliases. But does it say you must reject subqueries
without aliases? I strongly doubt that.
If I'm reading my draft copy of the SQL:2003 spec right (and there's
a good chance that I'm not, as it's not the easiest document for me
to parse), aliases *are* required.
Yes, they are. It's been like that since SQL92 if not before (I've
never seen a copy of SQL89). I was meaning to go look up whether
SQL2003 had gotten any laxer, but thanks for saving me the trouble.
The question for us is whether we should exceed the spec by allowing
something it does not. The fundamental reason why not is that we would
have to invent an alias for the unnamed sub-select. That has several
drawbacks: more code to try to pick an unused alias, more bug reports
from people whose queries conflicted with whatever we picked. In return
for which, we're providing absolutely zip real increase in
functionality, and we're encouraging people to write unportable
SQL-spec-incompatible code. There is a future-proofing argument too:
if the committee ever decides this is a good idea (which may be unlikely
after 15 years, but you never know), they could well define the default
alias in a different way than we had done, and then we are stuck behind
the eight ball.
One could also assume that if the SQL committee has not made this
oh-so-obvious improvement, there is some darn good reason why not.
I'm not privy to their minutes so I don't know what arguments were
made there, but I think we can eliminate "they never thought of it"
as an explanation. Three generations of the spec have been written
specifically to exclude this.
In short, lots of downsides here, and not a whole lot of upside.
regards, tom lane
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Michael Glaesemann
Sent: Wednesday, August 22, 2007 5:17 PM
To: Ben Tilly
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] SQL feature requestsOn Aug 22, 2007, at 18:45 , Ben Tilly wrote:
1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
does?2. Why is 'non-integer constant in GROUP BY' an error?
This works for now:
case when true then true end
but I don't know whether some future version of postgres might break
my code by banning that as well.
1. The SQL standard requires an alias for the subquery, but many
real-world SQL implementations relax this requirement in the case where
it is unambiguous. The standard doesn't say you have to reject
statements without the alias, it says only that you must accept the ones
that do. PostgreSQL has many things in its SQL where we accept things
that the standard doesn't require, and I don't see a good argument why
it shouldn't allow this.
2. The SQL standard only allows column names in group by lists, NOT
expressions. PostgreSQL extends the standard by allowing some, but not
all, expressions in the group by list (somewhat inconsistent, in my
view). Expressions in the group by list is actually a quite common
extension. But PostgreSQL also extends the standard by attaching
special meaning to simple integer constants in the group by list, and
treating them as column numbers from the select list. As far as I
remember, the SQL standard only allows that on ORDER BY clauses.
Anyway, it seems reasonable to allow expressions in the group by, such
as:
Select a*10, sum(b) from x group by a*10;
But once you allow that, it seems like you should allow any expression,
even degenerate ones such as
select 'foo',sum(b) from x group by 'foo';
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Chuck McDevitt
Sent: Wednesday, August 22, 2007 11:53 PM
To: Michael Glaesemann; Ben Tilly
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] SQL feature requests-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Michael Glaesemann
Sent: Wednesday, August 22, 2007 5:17 PM
To: Ben Tilly
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] SQL feature requestsOn Aug 22, 2007, at 18:45 , Ben Tilly wrote:
1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
does?2. Why is 'non-integer constant in GROUP BY' an error?
This works for now:
case when true then true end
but I don't know whether some future version of postgres might
break
my code by banning that as well.
1. The SQL standard requires an alias for the subquery, but many
real-world SQL implementations relax this requirement in the case
where
it is unambiguous. The standard doesn't say you have to reject
statements without the alias, it says only that you must accept the
ones
that do. PostgreSQL has many things in its SQL where we accept things
that the standard doesn't require, and I don't see a good argument why
it shouldn't allow this.2. The SQL standard only allows column names in group by lists, NOT
expressions. PostgreSQL extends the standard by allowing some, but
not
all, expressions in the group by list (somewhat inconsistent, in my
view). Expressions in the group by list is actually a quite common
extension. But PostgreSQL also extends the standard by attaching
special meaning to simple integer constants in the group by list, and
treating them as column numbers from the select list. As far as I
remember, the SQL standard only allows that on ORDER BY clauses.
Anyway, it seems reasonable to allow expressions in the group by, such
as:Select a*10, sum(b) from x group by a*10;
But once you allow that, it seems like you should allow any
expression,
even degenerate ones such as
select 'foo',sum(b) from x group by 'foo';
Just wanted to point out that the group by thing is really just
syntactic sugar.
You can always get a SQL standard compliant system to accept the
constants this way:
Select z,sum(b) from (
Select 'foo',b from x) a1 (z,b)
Group by z;
This will work (although with PostgreSQL, you might have to cast the
string constant to text, because it is "unknown" datatype).
If your implementation accepts:
group by case when true then 'foo' end
I think he meant: group by case when true then "foo" end
What would that mean? Regardless of whether or not it's accepted, it
should have *some* meaning.It's not equivalent to GROUP BY "foo"
Yea, but only because 'foo' is an identifier, and not a string constant.
test=# select record_id as foo
, count(observation_id) as bar
from observation
group by case when true
then 'foo'
end;
ERROR: column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
I think your example would be easier to understand if you removed the
quotes.
We don't detect the correctness of the above query. You can hardly say
that
this is a feature, but I am not inclined to see it as a troublesome bug
eighter.
Andreas
how much harder can it be to accept:
group by 'foo'
Presumably you meant group by "foo".
Imho pg should accept group by "foo". It could be part of a constant
removal, that also takes burden off the sort.
e.g. in "select x, count(*) from bar where x=5 group by x", x could be
removed since it is constant.
This is not about hardness of the implementation, but rather about
non-confusing behaviour I think.AFAIK, "group by 1" means "group by the first selected column", not
"group all rows together". But "group by 'foo'" would carry the second
meaning - "group all rows together".
Yes. I don't see the issue. 1 is imho sufficiently different even from
"1".
Pg is not alone in allowing column number in group by.
Andreas
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:
how much harder can it be to accept:
group by 'foo'
Presumably you meant group by "foo".
No that's the whole point. He meant the constant string 'foo' not the column
identifier "foo" which certainly should work now.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
In short, lots of downsides here, and not a whole lot of upside.
I highly doubt the spec would ever conflict with allowing the user to elide
the aliases given that Oracle (and others?) have always allowed this. Moreover
if it's been 15 years without them adding it surely that argues we can be
pretty sure they won't add them?
This seems like a particularly petty case compared to a lot of other
extensions we do allow. Surely allowing arbitrary expressions in GROUP BY is
far more likely to conflict in the future given how it constrains our grammar.
And in theory that provides no added functionality over aside from programmer
convenience as well. There are tons of extensions to the spec in the Postgres
grammar. This would be one of the simplest safest ones.
The upside is the convenience which after all is the same upside as most of
our spec grammar extensions. Many many programmers are accustomed to entering
ad-hoc queries of this form and forcing them to enter an alias for no purpose
is just silly pedanticism from their point of view. The portability of ad-hoc
queries is meaningless and if you don't refer to the alias in the query then
it's truly pointless.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Gregory Stark escribi�:
The upside is the convenience which after all is the same upside as most of
our spec grammar extensions. Many many programmers are accustomed to entering
ad-hoc queries of this form and forcing them to enter an alias for no purpose
is just silly pedanticism from their point of view. The portability of ad-hoc
queries is meaningless and if you don't refer to the alias in the query then
it's truly pointless.
So there's the compromise: allow not specifying an alias only if it's
not used in the rest of the query at all, so the subquery would be
effectively anonymous.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
If your implementation accepts:
group by case when true then 'foo' end
What would that mean? Regardless of whether or not it's accepted, it
should have *some* meaning.
To my eyes it has a very clear meaning, we're grouping on an
expression that happens to be a constant. Which happens to be the
same for all rows. Which is a spectacularly useless thing to actually
do, but the ability to do it happens to be convenient when I'm looking
for something to terminate a series of commas in a dynamically built
query.
It's not equivalent to GROUP BY "foo"
I wouldn't want it to be. Strings and identifiers are very different things.
[...]
*This* seems like a bug:
test=# select record_id
, count(observation_id) as bar
from observation
group by record_id
, case when true
then 'foo'
end;
record_id | bar
-----------+-----
1 | 4
2 | 4
3 | 2
(3 rows)
Why does it seem like a bug to you?
Turn it around, and tell me in what way is its behaviour surprising to
someone who knows SQL. You asked to group on something that is the
same for all rows. That group by condition did nothing. (Except
rendered the syntax valid when it might not have been.) As I would
expect.
Furthermore ask yourself whether anyone who wrote that would likely
have written it by accident.
Cheers,
Ben
"Ben Tilly" <btilly@gmail.com> writes:
... But then I need this (possibly
empty) list to have a valid group by statement at the end. In Oracle
I used to just write it like this:
SELECT ...
GROUP BY $possible_term_1
$possible_term_2
$possible_term_3
'end of possible groupings'
FWIW, that will work if you write it as
'end of possible groupings'::text
I'm disinclined to accept it as-is because (a) it's highly likely to be
a mistake, and (b) there's no principled way to assign a datatype to the
expression, if we are to interpret it as an expression.
Basically anything but an unadorned constant will work there.
regards, tom lane
"Alvaro Herrera" <alvherre@commandprompt.com> writes:
Gregory Stark escribió:
The upside is the convenience which after all is the same upside as most of
our spec grammar extensions. Many many programmers are accustomed to entering
ad-hoc queries of this form and forcing them to enter an alias for no purpose
is just silly pedanticism from their point of view. The portability of ad-hoc
queries is meaningless and if you don't refer to the alias in the query then
it's truly pointless.So there's the compromise: allow not specifying an alias only if it's
not used in the rest of the query at all, so the subquery would be
effectively anonymous.
If what's not used in the rest of the query? If you haven't specified the
alias what would you use? Surely even if we did generate an alias name nobody
would think the generated name was guaranteed to be stable and reference it?
I think the compromise is to instead of generating aliases at all just use an
alias like "*Anonymous Subquery*" and add a boolean flag indicating that that
range table is anonymous and not a valid target for references. I started
doing that a while back but got distracted (and discouraged since it seemed
not to have widespread support).
IMHO even generating non-anonymous aliases like "*Anonymous Subquery*1" would
be fine but I'm happy to have a flag hiding them too.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com