RFE: Column aliases in WHERE clauses
It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:
SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';
Is this the proper mailing list for such feature requests?
Thanks in advance,
Daniel Serodio
On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:
It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';Is this the proper mailing list for such feature requests?
I think this is explicitly disallowed by the spec.
And by Tom:
http://archives.postgresql.org/pgsql-general/2002-11/msg01411.php
Thanks in advance,
Daniel Serodio
-Ryan Kelly
On Sep 17, 2012, at 17:42, Ryan Kelly <rpkelly22@gmail.com> wrote:
On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:
It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';Is this the proper mailing list for such feature requests?
I think this is explicitly disallowed by the spec.
Just to be clear, the spec does not care where you post your feature requests...it is the feature that it disallows.
When in doubt the general list is a good choice and all the key people monitor it and will move the discussion elsewhere if warranted.
David J.
Ryan Kelly wrote:
On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:
It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter> 'a';Is this the proper mailing list for such feature requests?
I think this is explicitly disallowed by the spec.
And by Tom:
http://archives.postgresql.org/pgsql-general/2002-11/msg01411.php
Tom's explanation makes perfect sense, thanks for the pointer.
Regards,
Daniel Serodio
On Mon, Sep 17, 2012 at 3:10 PM, Daniel Serodio (lists)
<daniel.lists@mandic.com.br> wrote:
Ryan Kelly wrote:
On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:
It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';Is this the proper mailing list for such feature requests?
I think this is explicitly disallowed by the spec.
And by Tom:
http://archives.postgresql.org/pgsql-general/2002-11/msg01411.phpTom's explanation makes perfect sense, thanks for the pointer.
This definitely makes sense in the context of aggregation, but I'm
wondering if the same argument applies in the use case originally
posted:
SELECT left(value, 1) as first_letter
FROM some_table
WHERE first_letter > 'a';
Obviously, you can write this as:
SELECT left(value, 1) as first_letter
FROM some_table
WHERE left(value, 1) > 'a';
This would run fine, though you'd be doing a sequential scan on the
entire table, getting the left most character in each value, then
filtering those results. This of course assumes you haven't built an
index on left(value, 1).
Thus, in theory the compiler *could* resolve the actual definition of
first_letter and substitute in that expression on the fly. I'm
wondering if that concept is actually disallowed by the SQL spec.
Obviously, it would add complexity (and compile overhead) but would be
somewhat handy to avoid repeating really complicated expressions.
Perhaps Common Table Expressions are a better way of doing this thing
anyhow.
Mike
Mike Christensen <mike@kitchenpc.com> writes:
This definitely makes sense in the context of aggregation, but I'm
wondering if the same argument applies in the use case originally
posted:
SELECT left(value, 1) as first_letter
FROM some_table
WHERE first_letter > 'a';
Obviously, you can write this as:
SELECT left(value, 1) as first_letter
FROM some_table
WHERE left(value, 1) > 'a';
This would run fine, though you'd be doing a sequential scan on the
entire table, getting the left most character in each value, then
filtering those results. This of course assumes you haven't built an
index on left(value, 1).
Thus, in theory the compiler *could* resolve the actual definition of
first_letter and substitute in that expression on the fly. I'm
wondering if that concept is actually disallowed by the SQL spec.
Yes, it is. If you read the spec you'll find that the scope of
visibility of names defined in the SELECT list doesn't include WHERE.
It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution. Conceptually, what
happens is:
1. Form the cartesian product of the tables listed in FROM (ie, all
combinations of rows).
2. Apply the WHERE condition to each row from 1, and drop rows that
don't pass it.
3. If there's a GROUP BY, merge the surviving rows into groups.
4. If there's aggregate functions, compute those over the rows in
each group.
5. If there's a HAVING, filter the grouped rows according to that.
6. Evaluate the SELECT expressions for each remaining row.
7. If there's an ORDER BY, evaluate those expressions and sort the
remaining rows accordingly.
(Obviously, implementations try to improve on this - you don't want
to actually form the cartesian product - but that's the conceptual
model.)
The traditional shortcut of doing "ORDER BY select-column-reference"
is okay according to this world view, because the SELECT expressions
are already available when ORDER BY needs them. However, it's not
sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
because those steps precede the evaluation of the SELECT expressions.
This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen.
Now, having said all that, if you try it you'll find that Postgres
does allow select column references in GROUP BY, using the model
you propose above of copying whatever expression is in SELECT into
GROUP BY. This is, to put it politely, a mistake that we are now
stuck with for backwards-compatibility reasons. It's not spec compliant
and it doesn't fit the language's conceptual model, but it's been that
way for long enough that we're not likely to take it out. We are not,
however, gonna introduce the same mistake elsewhere.
Obviously, it would add complexity (and compile overhead) but would be
somewhat handy to avoid repeating really complicated expressions.
Perhaps Common Table Expressions are a better way of doing this thing
anyhow.
CTEs or sub-selects are a better answer for that. Each sub-select has
its own instance of the conceptual pipeline.
regards, tom lane
On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mike Christensen <mike@kitchenpc.com> writes:
This definitely makes sense in the context of aggregation, but I'm
wondering if the same argument applies in the use case originally
posted:SELECT left(value, 1) as first_letter
FROM some_table
WHERE first_letter > 'a';Obviously, you can write this as:
SELECT left(value, 1) as first_letter
FROM some_table
WHERE left(value, 1) > 'a';This would run fine, though you'd be doing a sequential scan on the
entire table, getting the left most character in each value, then
filtering those results. This of course assumes you haven't built an
index on left(value, 1).Thus, in theory the compiler *could* resolve the actual definition of
first_letter and substitute in that expression on the fly. I'm
wondering if that concept is actually disallowed by the SQL spec.Yes, it is. If you read the spec you'll find that the scope of
visibility of names defined in the SELECT list doesn't include WHERE.It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution. Conceptually, what
happens is:1. Form the cartesian product of the tables listed in FROM (ie, all
combinations of rows).2. Apply the WHERE condition to each row from 1, and drop rows that
don't pass it.3. If there's a GROUP BY, merge the surviving rows into groups.
4. If there's aggregate functions, compute those over the rows in
each group.5. If there's a HAVING, filter the grouped rows according to that.
6. Evaluate the SELECT expressions for each remaining row.
7. If there's an ORDER BY, evaluate those expressions and sort the
remaining rows accordingly.(Obviously, implementations try to improve on this - you don't want
to actually form the cartesian product - but that's the conceptual
model.)The traditional shortcut of doing "ORDER BY select-column-reference"
is okay according to this world view, because the SELECT expressions
are already available when ORDER BY needs them. However, it's not
sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
because those steps precede the evaluation of the SELECT expressions.This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen.Now, having said all that, if you try it you'll find that Postgres
does allow select column references in GROUP BY, using the model
you propose above of copying whatever expression is in SELECT into
GROUP BY. This is, to put it politely, a mistake that we are now
stuck with for backwards-compatibility reasons. It's not spec compliant
and it doesn't fit the language's conceptual model, but it's been that
way for long enough that we're not likely to take it out. We are not,
however, gonna introduce the same mistake elsewhere.Obviously, it would add complexity (and compile overhead) but would be
somewhat handy to avoid repeating really complicated expressions.
Perhaps Common Table Expressions are a better way of doing this thing
anyhow.CTEs or sub-selects are a better answer for that. Each sub-select has
its own instance of the conceptual pipeline.
Excellent information, Tom! I've been somewhat curious on this
behavior for some time now, and it's great to get a detailed answer..
Mike
On 09/18/2012 07:32 AM, Tom Lane wrote:
It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution.
I just wish they hadn't written it backwards!
It'd be much less confusing were it formulated as something like:
SELECT
FROM thetable
WHERE first_letter > 'a'
RESULTS left(value,1) AS first_letter
or something, where the order is more obvious. I really dislike the way
SQL is written not-quite-backwards.
--
Craig Ringer
On Mon, 2012-09-17 at 16:44 -0700, Mike Christensen wrote:
On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mike Christensen <mike@kitchenpc.com> writes:
[-----------]
It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution. Conceptually, what
happens is:1. Form the cartesian product of the tables listed in FROM (ie, all
combinations of rows).2. Apply the WHERE condition to each row from 1, and drop rows that
don't pass it.3. If there's a GROUP BY, merge the surviving rows into groups.
4. If there's aggregate functions, compute those over the rows in
each group.5. If there's a HAVING, filter the grouped rows according to that.
6. Evaluate the SELECT expressions for each remaining row.
7. If there's an ORDER BY, evaluate those expressions and sort the
remaining rows accordingly.(Obviously, implementations try to improve on this - you don't want
to actually form the cartesian product - but that's the conceptual
model.)The traditional shortcut of doing "ORDER BY select-column-reference"
is okay according to this world view, because the SELECT expressions
[--------------]
are already available when ORDER BY needs them. However, it's not
sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
because those steps precede the evaluation of the SELECT expressions.This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen.
[-----------------]
Excellent information, Tom! I've been somewhat curious on this
behavior for some time now, and it's great to get a detailed answer..
Yes.
But it puzzles me, if it *conceptually* would be a signifficant
misstake, when what Tom calls "select-column-reference" (I understand
as: the colunm name introduced on the select-list), would actually be
regarded by the SQL parser as "macro-definition". Just to place the
*string*, defined at select-list-level by "AS <name>", to wherever it's
used in the WHERE/ORDER/GROUP or HAVING clauses at earlier then
select-list-evaluation processing stage.
Actual Tom's example(1):
SELECT 1/x AS inverse FROM data WHERE x <> 0;
extended to (2):
SELECT 1/x AS inverse FROM data WHERE x <> 0 AND 1/x > 20;
could be written by user as (3):
SELECT 1/x AS inverse FROM data WHERE x <> 0 AND inverse > 20;
but token/replaced to its form (2) before WHERE evaluation.
-R
"Craig" == Craig Ringer <ringerc@ringerc.id.au> writes:
Craig> I just wish they hadn't written it backwards!
Craig> It'd be much less confusing were it formulated as something
Craig> like:
Craig> SELECT FROM thetable WHERE first_letter > 'a' RESULTS
Craig> left(value,1) AS first_letter
Craig> or something, where the order is more obvious. I really
Craig> dislike the way SQL is written not-quite-backwards.
It's not "written backwards", it's plain natural language semantics:
"give me the first letter of all records where the first letter is
greater than a". Refining a set is better done walking from the more
general set to a subset, not the other way around, IMO: "give me all
persons that are females and over the age of 20". Mathematical set
builder notation does this in a similar fashion, for the same reason.
--
Eden Cardim
http://insoli.de
On Tue, Sep 18, 2012 at 4:44 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 09/18/2012 07:32 AM, Tom Lane wrote:
It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution.I just wish they hadn't written it backwards!
It'd be much less confusing were it formulated as something like:
SELECT
FROM thetable
WHERE first_letter > 'a'
RESULTS left(value,1) AS first_letteror something, where the order is more obvious. I really dislike the way SQL
is written not-quite-backwards.
I agree, SQL has its quirks. There are general principles, but they're
all violated somewhere... "UNION ALL" annoys me some by breaking the
rule that more words = more work, which is almost universally adhered
to elsewhere. But such is the nature of specifications.
ChrisA
On Tue, Sep 18, 2012 at 2:53 AM, Eden Cardim <eden@insoli.de> wrote:
"Craig" == Craig Ringer <ringerc@ringerc.id.au> writes:
Craig> I just wish they hadn't written it backwards!
Craig> It'd be much less confusing were it formulated as something
Craig> like:Craig> SELECT FROM thetable WHERE first_letter > 'a' RESULTS
Craig> left(value,1) AS first_letterCraig> or something, where the order is more obvious. I really
Craig> dislike the way SQL is written not-quite-backwards.It's not "written backwards", it's plain natural language semantics:
"give me the first letter of all records where the first letter is
greater than a". Refining a set is better done walking from the more
general set to a subset, not the other way around, IMO: "give me all
persons that are females and over the age of 20". Mathematical set
builder notation does this in a similar fashion, for the same reason.
Oh no, this debate again..
I do admit LINQ kind of threw me for a loop as they took the other
approach (from f in foo where f.id > 5 select f), which makes you
think about the collection you're working with first. I usually think
about the table first when I'm writing a query.
I can also say if the table came before the columns, we'd probably
have a lot more SQL editors with auto-complete that worked :)
Mike
"Mike" == Mike Christensen <mike@kitchenpc.com> writes:
Mike> I can also say if the table came before the columns, we'd
Mike> probably have a lot more SQL editors with auto-complete that
Mike> worked :)
There's nothing stopping an editor from making you type the table
first though, it's easier to implement that in an editor than it is to
change the way people have been thinking about math for the last few
of centuries.
On Tue, Sep 18, 2012 at 11:20 AM, Eden Cardim <eden@insoli.de> wrote:
"Mike" == Mike Christensen <mike@kitchenpc.com> writes:
Mike> I can also say if the table came before the columns, we'd
Mike> probably have a lot more SQL editors with auto-complete that
Mike> worked :)There's nothing stopping an editor from making you type the table
first though, it's easier to implement that in an editor than it is to
change the way people have been thinking about math for the last few
of centuries.
No, I meant editors that auto-complete SQL statements for you as
you're typing them. Like Intellisense in Visual Studio.
Obviously you wouldn't want to type "select " and then see a list of
every column in your database, since it doesn't know what table you
want yet.
Mike
"Mike" == Mike Christensen <mike@kitchenpc.com> writes:
Mike> No, I meant editors that auto-complete SQL statements for
Mike> you as you're typing them. Like Intellisense in Visual
Mike> Studio.
Mike> Obviously you wouldn't want to type "select " and then see a
Mike> list of every column in your database, since it doesn't know
Mike> what table you want yet.
Again, you can type "select" and the have the editor expand the whole
thing into "select _ from [cursor is now here]" and present you with a
list of tables for the autocomplete, then after you've selected the
table, it goes back to the field part of the query. I have emacs
rigged to do this with yasnippet, except that it relies on a
handwritten function, not intellisense. Point stands, easier done in a
single editor as opposed to changing the semantics of math.
On Tue, Sep 18, 2012 at 2:53 AM, Eden Cardim <eden@insoli.de> wrote:
"Craig" == Craig Ringer <ringerc@ringerc.id.au> writes:
Craig> I just wish they hadn't written it backwards!
Craig> It'd be much less confusing were it formulated as something
Craig> like:Craig> SELECT FROM thetable WHERE first_letter > 'a' RESULTS
Craig> left(value,1) AS first_letterCraig> or something, where the order is more obvious. I really
Craig> dislike the way SQL is written not-quite-backwards.It's not "written backwards", it's plain natural language semantics:
"give me the first letter of all records where the first letter is
greater than a". Refining a set is better done walking from the more
general set to a subset, not the other way around, IMO: "give me all
persons that are females and over the age of 20". Mathematical set
builder notation does this in a similar fashion, for the same reason.
Natural language semantics will get you into trouble though. After all, I
think Lisp follows natural language semantics remarkably closely if your
natural language is Irish Gaelic....
Best Wishes,
Chris Travers
Hi There,
I've snipped a piece for the daily digest because I take issue with what's asserted here as a reason for not allowing aliases in where clauses.
<< snip This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen. end snip>>
Irrespective of whether the standard prohibits aliases in where clauses, the reasoning here is irrelevant to the discussion at hand.
If I say:
SELECT 1/x AS inverse FROM data WHERE x <> 0 or inverse > 0.5 (for arguments sake) (in a SQL dialect that supports it)
then I must expect inverse to be evaluated for every row, exactly as if I said:
SELECT 1/x AS inverse FROM data WHERE x <> 0 or 1/x > 0.5
It's surely not the role of the standard to protect us from the consequences of our own folly.
Since some dialects support the idiom and others don't there can't be any compelling reason to withhold support. It's really a matter of style.
For my money the DRY style is better.
--
Steve Haresnape
Creative Integrity Ltd
Import Notes
Resolved by subject fallback
On 19/09/2012 04:57, Chris Travers wrote:
Natural language semantics will get you into trouble though. After all,
I think Lisp follows natural language semantics remarkably closely if
your natural language is Irish Gaelic....
Really? I haven't used Irish seriously since I left school - maybe I
should learn Lisp.... :-)
Ray.
--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie
On 09/19/2012 02:36 PM, Steve Haresnape wrote:
Since some dialects support the idiom and others don't there can't be
any compelling reason to withhold support. It's really a matter of style.
For my money the DRY style is better.
So you're saying that you want the implementation to effectively clone
the aliased SELECT term into the WHERE clause?
If so, what about functions with side-effects?
--
Craig Ringer
On Wed, 2012-09-19 at 20:36 +0800, Craig Ringer wrote:
On 09/19/2012 02:36 PM, Steve Haresnape wrote:
Since some dialects support the idiom and others don't there can't be
any compelling reason to withhold support. It's really a matter of style.
For my money the DRY style is better.So you're saying that you want the implementation to effectively clone
the aliased SELECT term into the WHERE clause?If so, what about functions with side-effects?
What about them: if they are put in the where clause by the user - e.g.
not from unaliasing?
Just bug-traceing will be more difficult from alias obfuscating effects.
That's all.
-R