RFE: Column aliases in WHERE clauses

Started by Daniel Serodio (lists)over 13 years ago26 messagesgeneral
Jump to latest
#1Daniel Serodio (lists)
daniel.lists@mandic.com.br

It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:

SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';

Is this the proper mailing list for such feature requests?

Thanks in advance,
Daniel Serodio

#2Ryan Kelly
rpkelly22@gmail.com
In reply to: Daniel Serodio (lists) (#1)
Re: RFE: Column aliases in WHERE clauses

On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:

It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:

SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';

Is this the proper mailing list for such feature requests?

I think this is explicitly disallowed by the spec.

And by Tom:
http://archives.postgresql.org/pgsql-general/2002-11/msg01411.php

Thanks in advance,
Daniel Serodio

-Ryan Kelly

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Ryan Kelly (#2)
Re: RFE: Column aliases in WHERE clauses

On Sep 17, 2012, at 17:42, Ryan Kelly <rpkelly22@gmail.com> wrote:

On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:

It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:

SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';

Is this the proper mailing list for such feature requests?

I think this is explicitly disallowed by the spec.

Just to be clear, the spec does not care where you post your feature requests...it is the feature that it disallows.

When in doubt the general list is a good choice and all the key people monitor it and will move the discussion elsewhere if warranted.

David J.

#4Daniel Serodio (lists)
daniel.lists@mandic.com.br
In reply to: Ryan Kelly (#2)
Re: RFE: Column aliases in WHERE clauses

Ryan Kelly wrote:

On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:

It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:

SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter> 'a';

Is this the proper mailing list for such feature requests?

I think this is explicitly disallowed by the spec.

And by Tom:
http://archives.postgresql.org/pgsql-general/2002-11/msg01411.php

Tom's explanation makes perfect sense, thanks for the pointer.

Regards,
Daniel Serodio

#5Mike Christensen
mike@kitchenpc.com
In reply to: Daniel Serodio (lists) (#4)
Re: RFE: Column aliases in WHERE clauses

On Mon, Sep 17, 2012 at 3:10 PM, Daniel Serodio (lists)
<daniel.lists@mandic.com.br> wrote:

Ryan Kelly wrote:

On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:

It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:

SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter > 'a';

Is this the proper mailing list for such feature requests?

I think this is explicitly disallowed by the spec.

And by Tom:
http://archives.postgresql.org/pgsql-general/2002-11/msg01411.php

Tom's explanation makes perfect sense, thanks for the pointer.

This definitely makes sense in the context of aggregation, but I'm
wondering if the same argument applies in the use case originally
posted:

SELECT left(value, 1) as first_letter
FROM some_table
WHERE first_letter > 'a';

Obviously, you can write this as:

SELECT left(value, 1) as first_letter
FROM some_table
WHERE left(value, 1) > 'a';

This would run fine, though you'd be doing a sequential scan on the
entire table, getting the left most character in each value, then
filtering those results. This of course assumes you haven't built an
index on left(value, 1).

Thus, in theory the compiler *could* resolve the actual definition of
first_letter and substitute in that expression on the fly. I'm
wondering if that concept is actually disallowed by the SQL spec.
Obviously, it would add complexity (and compile overhead) but would be
somewhat handy to avoid repeating really complicated expressions.
Perhaps Common Table Expressions are a better way of doing this thing
anyhow.

Mike

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mike Christensen (#5)
Re: RFE: Column aliases in WHERE clauses

Mike Christensen <mike@kitchenpc.com> writes:

This definitely makes sense in the context of aggregation, but I'm
wondering if the same argument applies in the use case originally
posted:

SELECT left(value, 1) as first_letter
FROM some_table
WHERE first_letter > 'a';

Obviously, you can write this as:

SELECT left(value, 1) as first_letter
FROM some_table
WHERE left(value, 1) > 'a';

This would run fine, though you'd be doing a sequential scan on the
entire table, getting the left most character in each value, then
filtering those results. This of course assumes you haven't built an
index on left(value, 1).

Thus, in theory the compiler *could* resolve the actual definition of
first_letter and substitute in that expression on the fly. I'm
wondering if that concept is actually disallowed by the SQL spec.

Yes, it is. If you read the spec you'll find that the scope of
visibility of names defined in the SELECT list doesn't include WHERE.

It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution. Conceptually, what
happens is:

1. Form the cartesian product of the tables listed in FROM (ie, all
combinations of rows).

2. Apply the WHERE condition to each row from 1, and drop rows that
don't pass it.

3. If there's a GROUP BY, merge the surviving rows into groups.

4. If there's aggregate functions, compute those over the rows in
each group.

5. If there's a HAVING, filter the grouped rows according to that.

6. Evaluate the SELECT expressions for each remaining row.

7. If there's an ORDER BY, evaluate those expressions and sort the
remaining rows accordingly.

(Obviously, implementations try to improve on this - you don't want
to actually form the cartesian product - but that's the conceptual
model.)

The traditional shortcut of doing "ORDER BY select-column-reference"
is okay according to this world view, because the SELECT expressions
are already available when ORDER BY needs them. However, it's not
sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
because those steps precede the evaluation of the SELECT expressions.

This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen.

Now, having said all that, if you try it you'll find that Postgres
does allow select column references in GROUP BY, using the model
you propose above of copying whatever expression is in SELECT into
GROUP BY. This is, to put it politely, a mistake that we are now
stuck with for backwards-compatibility reasons. It's not spec compliant
and it doesn't fit the language's conceptual model, but it's been that
way for long enough that we're not likely to take it out. We are not,
however, gonna introduce the same mistake elsewhere.

Obviously, it would add complexity (and compile overhead) but would be
somewhat handy to avoid repeating really complicated expressions.
Perhaps Common Table Expressions are a better way of doing this thing
anyhow.

CTEs or sub-selects are a better answer for that. Each sub-select has
its own instance of the conceptual pipeline.

regards, tom lane

#7Mike Christensen
mike@kitchenpc.com
In reply to: Tom Lane (#6)
Re: RFE: Column aliases in WHERE clauses

On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mike Christensen <mike@kitchenpc.com> writes:

This definitely makes sense in the context of aggregation, but I'm
wondering if the same argument applies in the use case originally
posted:

SELECT left(value, 1) as first_letter
FROM some_table
WHERE first_letter > 'a';

Obviously, you can write this as:

SELECT left(value, 1) as first_letter
FROM some_table
WHERE left(value, 1) > 'a';

This would run fine, though you'd be doing a sequential scan on the
entire table, getting the left most character in each value, then
filtering those results. This of course assumes you haven't built an
index on left(value, 1).

Thus, in theory the compiler *could* resolve the actual definition of
first_letter and substitute in that expression on the fly. I'm
wondering if that concept is actually disallowed by the SQL spec.

Yes, it is. If you read the spec you'll find that the scope of
visibility of names defined in the SELECT list doesn't include WHERE.

It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution. Conceptually, what
happens is:

1. Form the cartesian product of the tables listed in FROM (ie, all
combinations of rows).

2. Apply the WHERE condition to each row from 1, and drop rows that
don't pass it.

3. If there's a GROUP BY, merge the surviving rows into groups.

4. If there's aggregate functions, compute those over the rows in
each group.

5. If there's a HAVING, filter the grouped rows according to that.

6. Evaluate the SELECT expressions for each remaining row.

7. If there's an ORDER BY, evaluate those expressions and sort the
remaining rows accordingly.

(Obviously, implementations try to improve on this - you don't want
to actually form the cartesian product - but that's the conceptual
model.)

The traditional shortcut of doing "ORDER BY select-column-reference"
is okay according to this world view, because the SELECT expressions
are already available when ORDER BY needs them. However, it's not
sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
because those steps precede the evaluation of the SELECT expressions.

This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen.

Now, having said all that, if you try it you'll find that Postgres
does allow select column references in GROUP BY, using the model
you propose above of copying whatever expression is in SELECT into
GROUP BY. This is, to put it politely, a mistake that we are now
stuck with for backwards-compatibility reasons. It's not spec compliant
and it doesn't fit the language's conceptual model, but it's been that
way for long enough that we're not likely to take it out. We are not,
however, gonna introduce the same mistake elsewhere.

Obviously, it would add complexity (and compile overhead) but would be
somewhat handy to avoid repeating really complicated expressions.
Perhaps Common Table Expressions are a better way of doing this thing
anyhow.

CTEs or sub-selects are a better answer for that. Each sub-select has
its own instance of the conceptual pipeline.

Excellent information, Tom! I've been somewhat curious on this
behavior for some time now, and it's great to get a detailed answer..

Mike

#8Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#6)
Re: RFE: Column aliases in WHERE clauses

On 09/18/2012 07:32 AM, Tom Lane wrote:

It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution.

I just wish they hadn't written it backwards!

It'd be much less confusing were it formulated as something like:

SELECT
FROM thetable
WHERE first_letter > 'a'
RESULTS left(value,1) AS first_letter

or something, where the order is more obvious. I really dislike the way
SQL is written not-quite-backwards.

--
Craig Ringer

#9Rafal Pietrak
rafal@zorro.isa-geek.com
In reply to: Mike Christensen (#7)
Re: RFE: Column aliases in WHERE clauses

On Mon, 2012-09-17 at 16:44 -0700, Mike Christensen wrote:

On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mike Christensen <mike@kitchenpc.com> writes:

[-----------]

It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution. Conceptually, what
happens is:

1. Form the cartesian product of the tables listed in FROM (ie, all
combinations of rows).

2. Apply the WHERE condition to each row from 1, and drop rows that
don't pass it.

3. If there's a GROUP BY, merge the surviving rows into groups.

4. If there's aggregate functions, compute those over the rows in
each group.

5. If there's a HAVING, filter the grouped rows according to that.

6. Evaluate the SELECT expressions for each remaining row.

7. If there's an ORDER BY, evaluate those expressions and sort the
remaining rows accordingly.

(Obviously, implementations try to improve on this - you don't want
to actually form the cartesian product - but that's the conceptual
model.)

The traditional shortcut of doing "ORDER BY select-column-reference"
is okay according to this world view, because the SELECT expressions

[--------------]

are already available when ORDER BY needs them. However, it's not
sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY,
because those steps precede the evaluation of the SELECT expressions.

This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen.

[-----------------]

Excellent information, Tom! I've been somewhat curious on this
behavior for some time now, and it's great to get a detailed answer..

Yes.

But it puzzles me, if it *conceptually* would be a signifficant
misstake, when what Tom calls "select-column-reference" (I understand
as: the colunm name introduced on the select-list), would actually be
regarded by the SQL parser as "macro-definition". Just to place the
*string*, defined at select-list-level by "AS <name>", to wherever it's
used in the WHERE/ORDER/GROUP or HAVING clauses at earlier then
select-list-evaluation processing stage.

Actual Tom's example(1):
SELECT 1/x AS inverse FROM data WHERE x <> 0;
extended to (2):
SELECT 1/x AS inverse FROM data WHERE x <> 0 AND 1/x > 20;
could be written by user as (3):
SELECT 1/x AS inverse FROM data WHERE x <> 0 AND inverse > 20;
but token/replaced to its form (2) before WHERE evaluation.

-R

#10Eden Cardim
eden@insoli.de
In reply to: Daniel Serodio (lists) (#1)
Column aliases in WHERE clauses

"Craig" == Craig Ringer <ringerc@ringerc.id.au> writes:

Craig> I just wish they hadn't written it backwards!

Craig> It'd be much less confusing were it formulated as something
Craig> like:

Craig> SELECT FROM thetable WHERE first_letter > 'a' RESULTS
Craig> left(value,1) AS first_letter

Craig> or something, where the order is more obvious. I really
Craig> dislike the way SQL is written not-quite-backwards.

It's not "written backwards", it's plain natural language semantics:
"give me the first letter of all records where the first letter is
greater than a". Refining a set is better done walking from the more
general set to a subset, not the other way around, IMO: "give me all
persons that are females and over the age of 20". Mathematical set
builder notation does this in a similar fashion, for the same reason.

--
Eden Cardim
http://insoli.de

#11Chris Angelico
rosuav@gmail.com
In reply to: Craig Ringer (#8)
Re: RFE: Column aliases in WHERE clauses

On Tue, Sep 18, 2012 at 4:44 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:

On 09/18/2012 07:32 AM, Tom Lane wrote:

It's easier to understand why this is if you realize that SQL has a very
clear model of a "pipeline" of query execution.

I just wish they hadn't written it backwards!

It'd be much less confusing were it formulated as something like:

SELECT
FROM thetable
WHERE first_letter > 'a'
RESULTS left(value,1) AS first_letter

or something, where the order is more obvious. I really dislike the way SQL
is written not-quite-backwards.

I agree, SQL has its quirks. There are general principles, but they're
all violated somewhere... "UNION ALL" annoys me some by breaking the
rule that more words = more work, which is almost universally adhered
to elsewhere. But such is the nature of specifications.

ChrisA

#12Mike Christensen
mike@kitchenpc.com
In reply to: Eden Cardim (#10)
Re: Column aliases in WHERE clauses

On Tue, Sep 18, 2012 at 2:53 AM, Eden Cardim <eden@insoli.de> wrote:

"Craig" == Craig Ringer <ringerc@ringerc.id.au> writes:

Craig> I just wish they hadn't written it backwards!

Craig> It'd be much less confusing were it formulated as something
Craig> like:

Craig> SELECT FROM thetable WHERE first_letter > 'a' RESULTS
Craig> left(value,1) AS first_letter

Craig> or something, where the order is more obvious. I really
Craig> dislike the way SQL is written not-quite-backwards.

It's not "written backwards", it's plain natural language semantics:
"give me the first letter of all records where the first letter is
greater than a". Refining a set is better done walking from the more
general set to a subset, not the other way around, IMO: "give me all
persons that are females and over the age of 20". Mathematical set
builder notation does this in a similar fashion, for the same reason.

Oh no, this debate again..

I do admit LINQ kind of threw me for a loop as they took the other
approach (from f in foo where f.id > 5 select f), which makes you
think about the collection you're working with first. I usually think
about the table first when I'm writing a query.

I can also say if the table came before the columns, we'd probably
have a lot more SQL editors with auto-complete that worked :)

Mike

#13Eden Cardim
eden@insoli.de
In reply to: Mike Christensen (#12)
Re: Column aliases in WHERE clauses

"Mike" == Mike Christensen <mike@kitchenpc.com> writes:

Mike> I can also say if the table came before the columns, we'd
Mike> probably have a lot more SQL editors with auto-complete that
Mike> worked :)

There's nothing stopping an editor from making you type the table
first though, it's easier to implement that in an editor than it is to
change the way people have been thinking about math for the last few
of centuries.

#14Mike Christensen
mike@kitchenpc.com
In reply to: Eden Cardim (#13)
Re: Column aliases in WHERE clauses

On Tue, Sep 18, 2012 at 11:20 AM, Eden Cardim <eden@insoli.de> wrote:

"Mike" == Mike Christensen <mike@kitchenpc.com> writes:

Mike> I can also say if the table came before the columns, we'd
Mike> probably have a lot more SQL editors with auto-complete that
Mike> worked :)

There's nothing stopping an editor from making you type the table
first though, it's easier to implement that in an editor than it is to
change the way people have been thinking about math for the last few
of centuries.

No, I meant editors that auto-complete SQL statements for you as
you're typing them. Like Intellisense in Visual Studio.

Obviously you wouldn't want to type "select " and then see a list of
every column in your database, since it doesn't know what table you
want yet.

Mike

#15Eden Cardim
eden@insoli.de
In reply to: Mike Christensen (#14)
Re: Column aliases in WHERE clauses

"Mike" == Mike Christensen <mike@kitchenpc.com> writes:

Mike> No, I meant editors that auto-complete SQL statements for
Mike> you as you're typing them. Like Intellisense in Visual
Mike> Studio.

Mike> Obviously you wouldn't want to type "select " and then see a
Mike> list of every column in your database, since it doesn't know
Mike> what table you want yet.

Again, you can type "select" and the have the editor expand the whole
thing into "select _ from [cursor is now here]" and present you with a
list of tables for the autocomplete, then after you've selected the
table, it goes back to the field part of the query. I have emacs
rigged to do this with yasnippet, except that it relies on a
handwritten function, not intellisense. Point stands, easier done in a
single editor as opposed to changing the semantics of math.

#16Chris Travers
chris.travers@gmail.com
In reply to: Eden Cardim (#10)
Re: Column aliases in WHERE clauses

On Tue, Sep 18, 2012 at 2:53 AM, Eden Cardim <eden@insoli.de> wrote:

"Craig" == Craig Ringer <ringerc@ringerc.id.au> writes:

Craig> I just wish they hadn't written it backwards!

Craig> It'd be much less confusing were it formulated as something
Craig> like:

Craig> SELECT FROM thetable WHERE first_letter > 'a' RESULTS
Craig> left(value,1) AS first_letter

Craig> or something, where the order is more obvious. I really
Craig> dislike the way SQL is written not-quite-backwards.

It's not "written backwards", it's plain natural language semantics:
"give me the first letter of all records where the first letter is
greater than a". Refining a set is better done walking from the more
general set to a subset, not the other way around, IMO: "give me all
persons that are females and over the age of 20". Mathematical set
builder notation does this in a similar fashion, for the same reason.

Natural language semantics will get you into trouble though. After all, I
think Lisp follows natural language semantics remarkably closely if your
natural language is Irish Gaelic....

Best Wishes,
Chris Travers

#17Steve Haresnape
s.haresnape@creativeintegrity.co.nz
In reply to: Chris Angelico (#11)
Re: RFE: Column aliases in WHERE clauses

Hi There,

I've snipped a piece for the daily digest because I take issue with what's asserted here as a reason for not allowing aliases in where clauses.

<< snip This isn't just academic nit-picking either, because the SELECT
expressions might not be valid for rows that don't pass WHERE etc.
Consider
SELECT 1/x AS inverse FROM data WHERE x <> 0;
The implementation *must* apply WHERE before computing the SELECT
expressions, or it'll get zero-divide failures that should not happen. end snip>>

Irrespective of whether the standard prohibits aliases in where clauses, the reasoning here is irrelevant to the discussion at hand.

If I say:
SELECT 1/x AS inverse FROM data WHERE x <> 0 or inverse > 0.5 (for arguments sake) (in a SQL dialect that supports it)
then I must expect inverse to be evaluated for every row, exactly as if I said:
SELECT 1/x AS inverse FROM data WHERE x <> 0 or 1/x > 0.5

It's surely not the role of the standard to protect us from the consequences of our own folly.

Since some dialects support the idiom and others don't there can't be any compelling reason to withhold support. It's really a matter of style.
For my money the DRY style is better.

--
Steve Haresnape
Creative Integrity Ltd

In reply to: Chris Travers (#16)
Re: Column aliases in WHERE clauses

On 19/09/2012 04:57, Chris Travers wrote:

Natural language semantics will get you into trouble though. After all,
I think Lisp follows natural language semantics remarkably closely if
your natural language is Irish Gaelic....

Really? I haven't used Irish seriously since I left school - maybe I
should learn Lisp.... :-)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

#19Craig Ringer
craig@2ndquadrant.com
In reply to: Steve Haresnape (#17)
Re: RFE: Column aliases in WHERE clauses

On 09/19/2012 02:36 PM, Steve Haresnape wrote:

Since some dialects support the idiom and others don't there can't be
any compelling reason to withhold support. It's really a matter of style.
For my money the DRY style is better.

So you're saying that you want the implementation to effectively clone
the aliased SELECT term into the WHERE clause?

If so, what about functions with side-effects?

--
Craig Ringer

#20Rafal Pietrak
rafal@zorro.isa-geek.com
In reply to: Craig Ringer (#19)
Re: RFE: Column aliases in WHERE clauses

On Wed, 2012-09-19 at 20:36 +0800, Craig Ringer wrote:

On 09/19/2012 02:36 PM, Steve Haresnape wrote:

Since some dialects support the idiom and others don't there can't be
any compelling reason to withhold support. It's really a matter of style.
For my money the DRY style is better.

So you're saying that you want the implementation to effectively clone
the aliased SELECT term into the WHERE clause?

If so, what about functions with side-effects?

What about them: if they are put in the where clause by the user - e.g.
not from unaliasing?

Just bug-traceing will be more difficult from alias obfuscating effects.
That's all.

-R

#21David G. Johnston
david.g.johnston@gmail.com
In reply to: Steve Haresnape (#17)
#22Chris Angelico
rosuav@gmail.com
In reply to: David G. Johnston (#21)
#23David G. Johnston
david.g.johnston@gmail.com
In reply to: Chris Angelico (#22)
#24Jasen Betts
jasen@xnet.co.nz
In reply to: Daniel Serodio (lists) (#1)
#25Chris Travers
chris.travers@gmail.com
In reply to: Jasen Betts (#24)
#26Rafal Pietrak
rafal@zorro.isa-geek.com
In reply to: Chris Travers (#25)