any way for ORDER BY x to imply NULLS FIRST in 8.3?
Is there any way to "hardcode" the NULL handling in an index (as per
http://www.postgresql.org/docs/8.3/static/indexes-ordering.html) so that
SELECT * FROM t ORDER BY foo automatically implies NULLS FIRST (and,
similarly so that SELECT * FROM t ORDER BY foo DESC automatically
implies NULLS LAST)? Thing is, I'm using PHP Symfony/Propel to generate
their SQL and have no easy way to influence how they do so.
Thanks.
rihad <rihad@mail.ru> writes:
Is there any way to "hardcode" the NULL handling in an index (as per
http://www.postgresql.org/docs/8.3/static/indexes-ordering.html) so that
SELECT * FROM t ORDER BY foo automatically implies NULLS FIRST (and,
similarly so that SELECT * FROM t ORDER BY foo DESC automatically
implies NULLS LAST)?
No. This is not a question of how indexes behave, it is a question of
which semantics the parser ascribes to an undecorated ORDER BY request.
Thing is, I'm using PHP Symfony/Propel to generate
their SQL and have no easy way to influence how they do so.
SQL99 section 14.1 <declare cursor> saith:
... Whether
a sort key value that is null is considered greater or less
than a non-null value is implementation-defined, but all sort
key values that are null shall either be considered greater
than all non-null values or be considered less than all non-
null values. ...
Essentially the same text appears in SQL2003. Any application that
depends on one particular choice here is therefore broken, or at least
has chosen to work with only about half of the DBMSes in the world.
regards, tom lane
On Tue, 2007-11-06 at 11:38 +0400, rihad wrote:
Is there any way to "hardcode" the NULL handling in an index (as per
http://www.postgresql.org/docs/8.3/static/indexes-ordering.html) so
that
SELECT * FROM t ORDER BY foo automatically implies NULLS FIRST (and,
similarly so that SELECT * FROM t ORDER BY foo DESC automatically
implies NULLS LAST)? Thing is, I'm using PHP Symfony/Propel to
generate
their SQL and have no easy way to influence how they do so.
As Tom already stated, the ordering of NULLs with respect to non-NULLs
is defined by the implementation.
However, it's not clear that you've considered a clause like 'ORDER BY
(foo IS NULL), foo', which I believe is not implementation dependent.
(In SQL2003 draft, true is defined to sort before false. I can't find a
similar statement in SQL92 or SQL99.)
Bear in mind that the ordering of rows with the same value (incl. NULL)
of foo is arbitrary. And, I guess that the equivalence of this query on
two systems will depend on the collating locale also (but I'm a neophyte
in this area).
-Reece
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
Reece Hart <reece@harts.net> writes:
However, it's not clear that you've considered a clause like 'ORDER BY
(foo IS NULL), foo', which I believe is not implementation dependent.
Yeah, that should work reasonably portably ... where "portable" means
"equally lousy performance in every implementation", unfortunately :-(.
I rather doubt that many implementations will see through that to decide
that they can avoid an explicit sort.
(In SQL2003 draft, true is defined to sort before false. I can't find a
similar statement in SQL92 or SQL99.)
SQL92 doesn't actually acknowledge boolean as a data type, so it's not
gonna say that; but SQL99 does, and it has
The value true_ is greater than the value false_
under 4.6.1 Comparison and assignment of booleans
regards, tom lane
On Tue, 2007-11-06 at 09:48 -0500, Tom Lane wrote:
Essentially the same text appears in SQL2003. Any application that
depends on one particular choice here is therefore broken, or at least
has chosen to work with only about half of the DBMSes in the world.
If an application has already made that choice then we should allow them
the opportunity to work with PostgreSQL. The application may be at
fault, but PostgreSQL is the loser because of that decision.
The SQL Standard says that the default for this is defined by the
implementation; that doesn't bar us from changing the implementation if
we wish. We can do that without changing PostgreSQL's historic default.
Perhaps we can have a parameter?
default_null_sorting = 'last' # may alternatively be set to 'first'
(or another wording/meaning.)
That is what I thought you'd implemented, otherwise I would have
suggested this myself way back. This new parameter would be a small
change, but will make a major difference to application portability.
This seems like the key to unlocking your new functionality for most
people.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote:
If an application has already made that choice then we should allow them
the opportunity to work with PostgreSQL. The application may be at
fault, but PostgreSQL is the loser because of that decision.The SQL Standard says that the default for this is defined by the
implementation; that doesn't bar us from changing the implementation if
we wish. We can do that without changing PostgreSQL's historic default.Perhaps we can have a parameter?
default_null_sorting = 'last' # may alternatively be set to 'first'
(or another wording/meaning.)
That is what I thought you'd implemented, otherwise I would have
suggested this myself way back. This new parameter would be a small
change, but will make a major difference to application portability.This seems like the key to unlocking your new functionality for most
people.
You already have that control at the SQL SELECT level so you are just
avoiding typing to add the GUC parameter. I think we need more requests
for such a feature before we add it.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Wed, 2007-11-07 at 08:38 -0500, Bruce Momjian wrote:
Simon Riggs wrote:
If an application has already made that choice then we should allow them
the opportunity to work with PostgreSQL. The application may be at
fault, but PostgreSQL is the loser because of that decision.The SQL Standard says that the default for this is defined by the
implementation; that doesn't bar us from changing the implementation if
we wish. We can do that without changing PostgreSQL's historic default.Perhaps we can have a parameter?
default_null_sorting = 'last' # may alternatively be set to 'first'
(or another wording/meaning.)
That is what I thought you'd implemented, otherwise I would have
suggested this myself way back. This new parameter would be a small
change, but will make a major difference to application portability.This seems like the key to unlocking your new functionality for most
people.You already have that control at the SQL SELECT level so you are just
avoiding typing to add the GUC parameter.
My understanding is that both MySQL and MSSQL support NULLS FIRST by
default, so being able to accept much of their SQL without change would
be a huge win.
Editing an application, you would be required to add the words NULLS
FIRST to every single ORDER BY and every single CREATE INDEX in an
application. If we know that is what people would do, why not have one
parameter to do this for them?
Now imagine you are writing an application that has to work on multiple
databases. Can you realistically create a workable framework that has
the SQL written in multiple different ways? That issue is the big issue
preventing many off-the-shelf software vendors from supporting Postgres.
Say you did decide to edit the application. As soon as you edit the SQL
within an application it typically will violate any support contract in
place. That's usually enough to prevent even the brave from doing this.
One might argue that SQL generators such as Hibernate can automatically
and easily generate the required SQL, so they don't need this. That's
very nice to know we'll be able to use the new feature maybe 10-20% of
the time, but what about other applications?
We already have parameters of this category, for example:
default_with_oids == WITH OIDS text on CREATE TABLE
default_transaction_isolation...
default_read_only...
plus many of the other GUCs in statement behaviour section of the Server
Configuration chapter.
add mising from
transform null equals etc
http://developer.postgresql.org/pgdocs/postgres/runtime-config-compatible.html#RUNTIME-CONFIG-COMPATIBLE-CLIENTS
I think we need more requests for such a feature before we add it.
Almost none of the features we've added have come by request. Features
get added because we see the benefit ourselves. Yes, we should discuss
this more widely; I'm confident many others will see the benefit in
allowing migration from other systems to happen more easily.
What we have now implements SQL Standard behaviour. I think that's
uninteresting for 99% of applications. I believe in standardisation, but
nobody gets excited about it. There are few applications that will
specify NULLS FIRST for a few queries only, actually coding that into
the SQL.
Implement SQLServer and MySQL behaviour? Now we're talking about
hundreds of new applications that might decide to migrate/support
PostgreSQL because of our flexibility in being able to support both
kinds of sorting.
It's going to be a short patch.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Wed, Nov 07, 2007 at 02:37:41PM +0000, Simon Riggs wrote:
Editing an application, you would be required to add the words NULLS
FIRST to every single ORDER BY and every single CREATE INDEX in an
application. If we know that is what people would do, why not have one
parameter to do this for them?
I find it hard to beleive that every single query in an application
depends on the ordering of NULLs. In fact, I don't think I've even
written a query that depended on a particular way of sorting NULLs. Is
it really that big a deal?
Implement SQLServer and MySQL behaviour? Now we're talking about
hundreds of new applications that might decide to migrate/support
PostgreSQL because of our flexibility in being able to support both
kinds of sorting.
TBH I think long term is should be attached to each column, as it is a
property of the collation (my COLLATE patch let you specify it per
column).
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy
Simon Riggs <simon@2ndquadrant.com> writes:
Perhaps we can have a parameter?
default_null_sorting = 'last' # may alternatively be set to 'first'
Not unless it's locked down at initdb time. Otherwise flipping the
value bars you from using every existing index ... including those
on the system catalogs ... which were made with the other setting.
I put this in the same category as altering the identifier case-folding
rules. Yeah, it'd be great to be all things to all people, but the
implementation pain and risk of breakage of existing applications
isn't worth it.
regards, tom lane
On Wed, 2007-11-07 at 16:05 +0100, Martijn van Oosterhout wrote:
On Wed, Nov 07, 2007 at 02:37:41PM +0000, Simon Riggs wrote:
Editing an application, you would be required to add the words NULLS
FIRST to every single ORDER BY and every single CREATE INDEX in an
application. If we know that is what people would do, why not have one
parameter to do this for them?I find it hard to beleive that every single query in an application
depends on the ordering of NULLs. In fact, I don't think I've even
written a query that depended on a particular way of sorting NULLs. Is
it really that big a deal?
True, but how would you know for certain? You'd need to examine each
query to be able to tell, which would take even longer. Or would you not
bother, catch a few errors in test and then wait for the application to
break in random ways when a NULL is added later? I guess that's what
most people do, if they do convert.
I'd like to remove one difficult barrier to Postgres adoption. We just
need some opinions from people who *havent* converted to Postgres, which
I admit is difficult cos they're not listening.
Implement SQLServer and MySQL behaviour? Now we're talking about
hundreds of new applications that might decide to migrate/support
PostgreSQL because of our flexibility in being able to support both
kinds of sorting.TBH I think long term is should be attached to each column, as it is a
property of the collation (my COLLATE patch let you specify it per
column).
That's a great idea, but orthogonal to the discussion about migrating
from other databases. No other database works like that, nor does the
SQL standard, but I'll admit its sound thinking otherwise.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
On Wed, 2007-11-07 at 10:23 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Perhaps we can have a parameter?
default_null_sorting = 'last' # may alternatively be set to 'first'Not unless it's locked down at initdb time. Otherwise flipping the
value bars you from using every existing index ... including those
on the system catalogs ... which were made with the other setting.
Seems reasonable, as a first step.
There are a number of things that need to be moved from initdb to be
settable parameters, so this is just one of them, for later releases. We
should be able to enforce one setting of the parameter at bootstrap
time, so the system indexes all get built the standard way with the
initdb locale. We can then be free to set the locale for indexes after
that, but that is another issue.
I put this in the same category as altering the identifier case-folding
rules.
That has much less effect on application portability, so although the
issues are similar the importance is not.
Yeah, it'd be great to be all things to all people, but the
implementation pain and risk of breakage of existing applications
isn't worth it.
I don't suggest we should be _all_ things to _all_ people, just that we
should try to be provide our capabilities to _more_ people. I think its
a great feature and I want to see more people appreciate that.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote:
On Wed, 2007-11-07 at 16:05 +0100, Martijn van Oosterhout wrote:
On Wed, Nov 07, 2007 at 02:37:41PM +0000, Simon Riggs wrote:
Editing an application, you would be required to add the words NULLS
FIRST to every single ORDER BY and every single CREATE INDEX in an
application. If we know that is what people would do, why not have one
parameter to do this for them?I find it hard to beleive that every single query in an application
depends on the ordering of NULLs. In fact, I don't think I've even
written a query that depended on a particular way of sorting NULLs. Is
it really that big a deal?True, but how would you know for certain? You'd need to examine each
query to be able to tell, which would take even longer. Or would you not
bother, catch a few errors in test and then wait for the application to
break in random ways when a NULL is added later? I guess that's what
most people do, if they do convert.I'd like to remove one difficult barrier to Postgres adoption. We just
need some opinions from people who *havent* converted to Postgres, which
I admit is difficult cos they're not listening.
May I, as an outsider, comment? :) I really think of ASC NULLS FIRST
(and DESC NULLS LAST) as the way to go. Imagine a last_login column that
sorts users that have not logged in as the most recently logged in,
which is not very intuitive. I vote for sort_nulls_first defaulting to
false in order not to break bc.
Simon Riggs <simon@2ndquadrant.com> writes:
On Wed, 2007-11-07 at 10:23 -0500, Tom Lane wrote:
I put this in the same category as altering the identifier case-folding
rules.
That has much less effect on application portability,
Really? Try counting the number of requests for that in the archives,
vs the number of requests for this.
regards, tom lane
On Wed, 2007-11-07 at 11:39 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
On Wed, 2007-11-07 at 10:23 -0500, Tom Lane wrote:
I put this in the same category as altering the identifier case-folding
rules.That has much less effect on application portability,
Really? Try counting the number of requests for that in the archives,
vs the number of requests for this.
I think you're arguing in favour of both changes, not burying my point.
Most applications don't hit the case folding issue for identifiers.
Certainly people have, but those are people doing things with metadata
like trying to write tools that work with both. They're database savvy
people who come on list and try and fix things.
Almost all applications have NULLs and use ORDER BY and indexes. That
doesn't mean everybody is effected by NULL sorting, but they might be
and probably don't realise.
I think you're right in identifying there are other issues for
portability. My list would be:
1. statement level abort
2. equivalent performance of identical SQL (e.g. NOT IN)
3. case insensitive searches
4. NULL ordering
5. case folding identifiers
Those differ depending upon the database.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
Simon Riggs <simon@2ndquadrant.com> writes:
Perhaps we can have a parameter?
default_null_sorting = 'last' # may alternatively be set to 'first'Not unless it's locked down at initdb time. Otherwise flipping the
value bars you from using every existing index ... including those
on the system catalogs ... which were made with the other setting.
Surely if we added this we would also add explicit NULLS LAST clauses to all
system catalog indexes and system views and make explicitly constructed scans
in the backend use NULLS LAST.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!
Gregory Stark <stark@enterprisedb.com> writes:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
Not unless it's locked down at initdb time. Otherwise flipping the
value bars you from using every existing index ... including those
on the system catalogs ... which were made with the other setting.
Surely if we added this we would also add explicit NULLS LAST clauses to all
system catalog indexes and system views and make explicitly constructed scans
in the backend use NULLS LAST.
No, that's not the point; the point is that the performance of
*user-issued* queries (or even more to the point, psql or pg_dump-issued
queries) against the system catalogs would go to pot if they didn't
match the catalog ordering, and a run-time-dependent interpretation of
ORDER BY would make it very likely that the queries don't match, no
matter which underlying index ordering is installed.
Now, most if not all of the system indexes are on NOT NULL columns, so
one possible avenue to resolving that objection would be to teach the
planner that null sort direction can be disregarded when determining
whether an index on a not-null column matches a query. But that already
is making the patch 10x larger and more subtle than what Simon thinks
he's proposing; and I doubt it's the only change we'd find we needed.
A more general objection is that causing query semantics to change in
subtle ways based on a GUC variable has more often than not proven to be
a bad idea.
regards, tom lane
On 11/7/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
A more general objection is that causing query semantics to change in
subtle ways based on a GUC variable has more often than not proven to be
a bad idea.
On top of that, this is another one of those conversations that
basically are predicated on the premise that other databases have
quirks that make / encourage / allow the user to write bad SQL, and we
need to do something so that their bad SQL will run properly on
PostgreSQL.
I work with 3 Oracle DBAs, and they are all trained by Oracle (the
database, not the company) to write queries that make my brain hurt.
Case statement? nope, they use encode. And there are dozens of cases
where they use non-standard SQL, and they aren't going to stop any
time soon, because it's just what they know.
As someone who wishes we could switch case folding easily from lower
to upper for some use cases, I understand the desire of folks to want
things in pgsql to be easily switchable to fix these kinds of issues.
But I don't think most of them are worth the effort and the bugs that
could be introduced.
Reece Hart <reece@harts.net> writes:
However, it's not clear that you've considered a clause like 'ORDER BY
(foo IS NULL), foo', which I believe is not implementation dependent.Yeah, that should work reasonably portably ... where "portable" means
"equally lousy performance in every implementation", unfortunately :-(.
I rather doubt that many implementations will see through that to decide
that they can avoid an explicit sort.
Well, an index on ((foo IS NULL), foo) might improve the performance
when sorting along these columns, but sure it's not a cure-all. And you
still have to modify the SQL and the database schema ...
regards
TV
Em Wednesday 07 November 2007 13:54:32 rihad escreveu:
May I, as an outsider, comment? :) I really think of ASC NULLS FIRST
(and DESC NULLS LAST) as the way to go. Imagine a last_login column that
sorts users that have not logged in as the most recently logged in,
which is not very intuitive. I vote for sort_nulls_first defaulting to
false in order not to break bc.
But then, when ordering by login date, you should use COALESCE and infinity
for them
(http://www.postgresql.org/docs/8.2/interactive/datatype-datetime.html).
--
Jorge Godoy <jgodoy@gmail.com>
Em Wednesday 07 November 2007 13:54:32 rihad escreveu:
May I, as an outsider, comment? :) I really think of ASC NULLS FIRST
(and DESC NULLS LAST) as the way to go. Imagine a last_login column that
sorts users that have not logged in as the most recently logged in,
which is not very intuitive. I vote for sort_nulls_first defaulting to
false in order not to break bc.But then, when ordering by login date, you should use COALESCE and infinity
for them
(http://www.postgresql.org/docs/8.2/interactive/datatype-datetime.html).
It's not an easy thing to do with for example Propel 1.2 ORM (written in
PHP):
$criteria->addDescendingOrderByColumn(myPeer::LAST_LOGIN); // no place
to shove database-specific attributes in.
which was my main point.
Import Notes
Resolved by subject fallback