Does anybody use ORDER BY x USING y?

Started by Martijn van Oosterhoutover 20 years ago23 messageshackers

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

Hi,

PostgreSQL's grammer allows you to specify the operator to sort with in
the ORDER BY clause. Various bits of the backend support this feature,
yet it appears to partially undocumented. I can't find it in the ORDER
BY [1]http://www.postgresql.org/docs/8.0/interactive/queries-order.html section but there is a paragraph on it under the SELECT
documentation [2]http://www.postgresql.org/docs/8.0/interactive/sql-select.html -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/.

I'm asking because SQL COLLATE support is really doing something
similar. I was wondering if instead of adding something in parallel just
replace sortop with collateid. This means all the code relating to
pathkeys won't need to change since we still use OIDs for the pathkeys,
they're just not operator oids anymore.

We can continue to support USING [op] as long as [op] is one of the GT
or LT operators in the OPERATOR CLASS. This restriction may exist
already, I can't tell.

All we lose is the ability to say USING [arbitrary op]. Does anybody
use this. Would people object to requiring the operator after USING to
be part of an operator class?

Have a nice day,

[1]: http://www.postgresql.org/docs/8.0/interactive/queries-order.html
[2]: http://www.postgresql.org/docs/8.0/interactive/sql-select.html -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

josh@agliodbs.com

over 20 years ago

In reply to: Martijn van Oosterhout (#1)

Re: Does anybody use ORDER BY x USING y?

Martjin,

We can continue to support USING [op] as long as [op] is one of the GT
or LT operators in the OPERATOR CLASS. This restriction may exist
already, I can't tell.

All we lose is the ability to say USING [arbitrary op]. Does anybody
use this. Would people object to requiring the operator after USING to
be part of an operator class?

Hmmm ... would this prevent the hackish workaround for case-insensitive sort?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Josh Berkus (#2)

Re: Does anybody use ORDER BY x USING y?

On Sun, Sep 18, 2005 at 12:34:10PM -0700, Josh Berkus wrote:

All we lose is the ability to say USING [arbitrary op]. Does anybody
use this. Would people object to requiring the operator after USING to
be part of an operator class?

Hmmm ... would this prevent the hackish workaround for case-insensitive sort?

Err, which hackish workaround would that be? The right solution is
citext which creates it's own operator class. This doesn't have
anything to do with functional indexes either.

I've been using Google to find any interesting use of the USING clause
but havn't found any yet.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

andrew@dunslane.net

over 20 years ago

In reply to: Martijn van Oosterhout (#3)

Re: Does anybody use ORDER BY x USING y?

Martijn van Oosterhout wrote:

On Sun, Sep 18, 2005 at 12:34:10PM -0700, Josh Berkus wrote:

All we lose is the ability to say USING [arbitrary op]. Does anybody
use this. Would people object to requiring the operator after USING to
be part of an operator class?

Hmmm ... would this prevent the hackish workaround for case-insensitive sort?

Err, which hackish workaround would that be? The right solution is
citext which creates it's own operator class. This doesn't have
anything to do with functional indexes either.

Last time I looked it appeared to have significant limitations, and some
considerable inefficiencies (e.g, copying the strings and folding them
to canonical case on every comparison). I would certainly be extremely
wary of just saying "that's the solution".

cheers

andrew

john@geeknet.com.au

over 20 years ago

In reply to: Andrew Dunstan (#4)

Re: Does anybody use ORDER BY x USING y?

Martijn van Oosterhout Wrote:

All we lose is the ability to say USING [arbitrary op]. Does

anybody

use this. Would people object to requiring the operator after

USING

to be part of an operator class?

Hmmm ... would this prevent the hackish workaround for

case-insensitive sort?

Err, which hackish workaround would that be? The right
solution is citext which creates it's own operator class.
This doesn't have anything to do with functional indexes either.

I've been using Google to find any interesting use of the
USING clause but havn't found any yet.

I was actually of the impression that that was exacty what it was for:
specifying what op(class) to use for the sort in case you wanted to use
a non-default opclass for the type, and/or if the less-than operator
wasn't called '<'.

... John

Import Notes

Resolved by subject fallback

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Andrew Dunstan (#4)

Re: Does anybody use ORDER BY x USING y?

On Sun, Sep 18, 2005 at 04:19:06PM -0400, Andrew Dunstan wrote:

Err, which hackish workaround would that be? The right solution is
citext which creates it's own operator class. This doesn't have
anything to do with functional indexes either.

Last time I looked it appeared to have significant limitations, and some
considerable inefficiencies (e.g, copying the strings and folding them
to canonical case on every comparison). I would certainly be extremely
wary of just saying "that's the solution".

Ok, so citext has its limitations. Case-insensetive sort is hard [1]http://lafstern.org/matt/col2_new.pdf.
My real question was, what was the solution he was referring to using
the USING clause?

[1]: http://lafstern.org/matt/col2_new.pdf

--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: John Hansen (#5)

Re: Does anybody use ORDER BY x USING y?

On Mon, Sep 19, 2005 at 06:26:10AM +1000, John Hansen wrote:

I was actually of the impression that that was exacty what it was for:
specifying what op(class) to use for the sort in case you wanted to use
a non-default opclass for the type, and/or if the less-than operator
wasn't called '<'.

That's my thought. However, the code doesn't seem to restrict you to
that so I was wondering if there was any other use out there that we
should consider supporting...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Martijn van Oosterhout (#7)

Re: Does anybody use ORDER BY x USING y?

Martijn van Oosterhout <kleptog@svana.org> writes:

On Mon, Sep 19, 2005 at 06:26:10AM +1000, John Hansen wrote:

I was actually of the impression that that was exacty what it was for:
specifying what op(class) to use for the sort in case you wanted to use
a non-default opclass for the type, and/or if the less-than operator
wasn't called '<'.

That's my thought. However, the code doesn't seem to restrict you to
that so I was wondering if there was any other use out there that we
should consider supporting...

One of the half-baked ideas about operator classes that I mentioned a
few days ago was to either redesign or reinterpret USING in a way that
would make it easier to associate a btree opclass with a requested
ordering. I'm not sure that we want to *require* there to be a btree
opclass matching any ORDER BY request, but it's something to consider.
(There are some examples in the regression tests of ORDER BY using
operators that aren't in any btree opclass, but I'm not sure any of
them represent useful real-world cases. In principle, if the operator
represents a self-consistent ordering at all, then a btree opclass
could be built with it. So it could be argued that we're just
supporting programmer laziness to not require one.)

Right now we use some heuristics to try to identify an opclass
containing the mentioned operator, but this is pretty unreliable
and would become more so if reverse-sort opclasses became standard
equipment. Another thing that's flaky in the current treatment is
the question of whether NULLs sort before or after ordinary values.
We've essentially tried to force NULLs to sort "high" (as if they
compare greater than all ordinary values), so that ASC and DESC
orderings can be obtained from forward and backwards scans of an
ordinary btree index. This is going to break entirely in the
presence of reverse-sort opclasses --- given the current btree code,
such an opclass would cause NULLs to appear to sort "low". I suspect
we have to bring out the NULL sort behavior as an explicit property
of opclasses, but I'm not sure just how to do that. A related point
is that we not infrequently get requests for a way to make ORDER BY
sort nulls low; it'd be nice if we could actually support that,
rather than going in the direction of making sure it can't happen.

regards, tom lane

hannu@tm.ee

over 20 years ago

In reply to: Tom Lane (#8)

Re: Does anybody use ORDER BY x USING y?

On P, 2005-09-18 at 18:04 -0400, Tom Lane wrote:

Another thing that's flaky in the current treatment is
the question of whether NULLs sort before or after ordinary values.
We've essentially tried to force NULLs to sort "high" (as if they
compare greater than all ordinary values), so that ASC and DESC
orderings can be obtained from forward and backwards scans of an
ordinary btree index. This is going to break entirely in the
presence of reverse-sort opclasses --- given the current btree code,
such an opclass would cause NULLs to appear to sort "low". I suspect
we have to bring out the NULL sort behavior as an explicit property
of opclasses, but I'm not sure just how to do that. A related point
is that we not infrequently get requests for a way to make ORDER BY
sort nulls low; it'd be nice if we could actually support that,
rather than going in the direction of making sure it can't happen.

I think that placement of NULL's should be a property of ORDER BY and
separated from opclass.

From: http://opensource2.atlassian.com/projects/hibernate/browse/HHH-465
------------------------------------------------------------------------

support of nulls first / last in order clause

"NULLS LAST" is part of the SQL 99 standard.

The syntax is as follows:

ORDER BY [COLUMN NAME] [ASC | DESC] [NULLS FIRST | NULLS LAST]

In different DBs, the sorting of nulls relative to other values is
handled differently.

PostgreSQL - Nulls are considered HIGHER than non-nulls.
DB2 - Higher
MSSQL - Lower
MySQL - Lower
Oracle - Higher

The following DBs have supported this functionality:
DB2 V7
Oracle 9i

PostgreSQL, MySQL, SQLServer do not appear to support this from what I
can gather.

see http://forum.hibernate.org/viewtopic.php?
t=942176&start=0&postdays=0&postorder=asc&highlight=

--
Hannu Krosing <hannu@skype.net>

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Hannu Krosing (#9)

Re: Does anybody use ORDER BY x USING y?

Hannu Krosing <hannu@skype.net> writes:

I think that placement of NULL's should be a property of ORDER BY and
separated from opclass.

That would be an extremely bad idea, because it would immediately remove
index scans as one way to meet an ORDER BY. I'm thinking in terms of
NULL high/low as becoming a property of btree opclasses so that indexes
know what to do with nulls, and so that the planner can tell whether a
given index meets the required sort ordering or not.

Alternatively we could define an index's ordering as being specified by
both an opclass and a NULL direction, but that doesn't seem better to
me; especially since the null-direction concept doesn't seem meaningful
for non-btree indexes at all, but a structure like that would require us
to associate a null-direction with all indexes.

regards, tom lane

andrew@dunslane.net

over 20 years ago

In reply to: Tom Lane (#10)

Re: Does anybody use ORDER BY x USING y?

Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

I think that placement of NULL's should be a property of ORDER BY and
separated from opclass.

That would be an extremely bad idea, because it would immediately remove
index scans as one way to meet an ORDER BY. I'm thinking in terms of
NULL high/low as becoming a property of btree opclasses so that indexes
know what to do with nulls, and so that the planner can tell whether a
given index meets the required sort ordering or not.

Alternatively we could define an index's ordering as being specified by
both an opclass and a NULL direction, but that doesn't seem better to
me; especially since the null-direction concept doesn't seem meaningful
for non-btree indexes at all, but a structure like that would require us
to associate a null-direction with all indexes.

Not sure I understand ... in fact I am sure I don't :-)

Are you envisioning that the null direction will be able to be selected
at the time of the select statement?

cheers

andrew

bruce@momjian.us

over 20 years ago

In reply to: Tom Lane (#10)

Re: Does anybody use ORDER BY x USING y?

Tom Lane <tgl@sss.pgh.pa.us> writes:

Hannu Krosing <hannu@skype.net> writes:

I think that placement of NULL's should be a property of ORDER BY and
separated from opclass.

That would be an extremely bad idea, because it would immediately remove
index scans as one way to meet an ORDER BY.

Well couldn't the index scan be taught to go fetch the NULLs in a separate
traversal?

--
greg

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Andrew Dunstan (#11)

Re: Does anybody use ORDER BY x USING y?

Andrew Dunstan <andrew@dunslane.net> writes:

Not sure I understand ... in fact I am sure I don't :-)
Are you envisioning that the null direction will be able to be selected
at the time of the select statement?

Yes, of course. My point is that we need to define "operator class" as
"all you need to know about the behavior of a particular index column".
Moving away from that equivalence is just going to mess things up with
no redeeming social benefit.

This looks bad, because the first conclusion is that for any particular
comparison function (eg, int4cmp) you'd want four separate operator
classes, to cover the combinations of ASC-sort and DESC-sort versus
NULLs-high and NULLs-low. But you'd be paying for that complication
somewhere, and ISTM the operator class abstraction is exactly the right
level to pay it at. We could ease the pain for creators of user-defined
types by inventing some mechanism that automatically creates the whole
set of operator classes --- this is another idea that's barely half
baked yet, but I think it ties in nicely with the idea of "operator
class families" to relate opclasses for different datatypes. Basically
I'd like to solve most of these issues by constructing a new layer atop
opclasses, not by deciding that an opclass doesn't convey the full story
about the behavior of an index column.

regards, tom lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Bruce Momjian (#12)

Re: Does anybody use ORDER BY x USING y?

Greg Stark <gsstark@mit.edu> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

That would be an extremely bad idea, because it would immediately remove
index scans as one way to meet an ORDER BY.

Well couldn't the index scan be taught to go fetch the NULLs in a separate
traversal?

(1) IS NULL is not an indexable operation, so no, not without
significant overhaul of the index AM API.

(2) This propagates a problem that is specific to orderable indexes (ie
btree) into code that is generic to all indexes, and thus creates the
problem of how do you deal with specifying NULL ordering without any
definition of ordering for non-NULLs.

(3) You still have to invent a mechanism to define whether you want
nulls first or last ... and make sure that that mechanism works for
plans that use explicit SORT steps as well as those that use index
scans.

regards, tom lane

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Tom Lane (#13)

Re: Does anybody use ORDER BY x USING y?

On Sun, Sep 18, 2005 at 11:23:01PM -0400, Tom Lane wrote:
<snip>

class families" to relate opclasses for different datatypes. Basically
I'd like to solve most of these issues by constructing a new layer atop
opclasses, not by deciding that an opclass doesn't convey the full story
about the behavior of an index column.

Where I'm currently going is creating a table of COLLATE orders. These
collate orders would refer to operator classes but "tweak" them. For
example, things like:

- Sort ascending or descending (descending reverses the bt*cmp test)
- NULLs first or last
- Locale for text types
- etc

They could be declared in the operator class definition, or generated
automatically. You could then do things like:

CREATE INDEX ... (field1 COLLATE ascending, field2 COLLATE descending)

for those queries where you want ascending on one column and descending
on another. Or perhaps:

CREATE INDEX ... (textfield COLLATE ignore_case)
CREATE INDEX ... (textfield COLLATE locale_us)
CREATE INDEX ... (textfield COLLATE optimise_regex)
CREATE INDEX ... (point COLLATE distance)

However, I can't see how this can relate "families" of operator classes
like you talk about Tom. ISTM that needs to dealt with somewhere else,
given that it's unrelated to order.

This is going way out of spec though...
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

hannu@tm.ee

over 20 years ago

In reply to: Tom Lane (#14)

Re: Does anybody use ORDER BY x USING y?

On P, 2005-09-18 at 23:34 -0400, Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

That would be an extremely bad idea, because it would immediately remove
index scans as one way to meet an ORDER BY.

Well couldn't the index scan be taught to go fetch the NULLs in a separate
traversal?

(1) IS NULL is not an indexable operation, so no, not without
significant overhaul of the index AM API.

But we do store NULLs in indexes, so why is it not indexable?

This is either an interface bug (not making use of stored info) or
storage bug (wasting space storing unneccessary info)

(2) This propagates a problem that is specific to orderable indexes (ie
btree) into code that is generic to all indexes, and thus creates the
problem of how do you deal with specifying NULL ordering without any
definition of ordering for non-NULLs.

we dont need an ordering of NULLs for cases without ORDER BY. You can't
specify NULLS FIRST/LAST without ORDER BY.

When one needs to use index for ordering we could use a plan like

APPEND
INDEX SCAN FOR NULLS, FILTER IS NULL
INDEX SCAN FOR NOT NULLS, FILTER IS NOT NULL

if NULL's are needed to be returned as sorted first/last

If no index scan is used, sorting code should be made smart enough to
recognize nulls and deal with it.

(3) You still have to invent a mechanism to define whether you want
nulls first or last ... and make sure that that mechanism works for
plans that use explicit SORT steps as well as those that use index
scans.

The main place I see problems is multiple field indexes, where some non-
first field is null. For single field indexes simply making two index
scans, possibly in different directions seems easy.

--
Hannu Krosing <hannu@skype.net>

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Hannu Krosing (#16)

Re: Does anybody use ORDER BY x USING y?

On Mon, Sep 19, 2005 at 11:13:05AM +0300, Hannu Krosing wrote:

(1) IS NULL is not an indexable operation, so no, not without
significant overhaul of the index AM API.

But we do store NULLs in indexes, so why is it not indexable?

This is either an interface bug (not making use of stored info) or
storage bug (wasting space storing unneccessary info)

Err, indexes used to not store NULLs to save space. However, it turns
out that SQL UNIQUE has something to say about NULLs in unique columns
so they had to be included.

However, the machinary to decide if an index is usable assumes that
usable operators have two arguments and IS NULL isn't really an
operator in the PostgreSQL sense and doesn't have two arguments either.

*If* that can be fixed, then we can be more flexible. But if it were
easy it would have been done long ago...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

hannu@tm.ee

over 20 years ago

In reply to: Martijn van Oosterhout (#17)

Re: Does anybody use ORDER BY x USING y?

On E, 2005-09-19 at 11:24 +0200, Martijn van Oosterhout wrote:

On Mon, Sep 19, 2005 at 11:13:05AM +0300, Hannu Krosing wrote:

(1) IS NULL is not an indexable operation, so no, not without
significant overhaul of the index AM API.

But we do store NULLs in indexes, so why is it not indexable?

This is either an interface bug (not making use of stored info) or
storage bug (wasting space storing unneccessary info)

Err, indexes used to not store NULLs to save space. However, it turns
out that SQL UNIQUE has something to say about NULLs in unique columns
so they had to be included.

surely not UNIQUE

hannu=# create table tabuniq(i int );
CREATE TABLE
hannu=# create index tabuniq_ndx on tabuniq(i);
CREATE INDEX
hannu=# insert into tabuniq values(1);
INSERT 20560497 1
hannu=# insert into tabuniq values(2);
INSERT 20560498 1
hannu=# insert into tabuniq values(null);
INSERT 20560499 1
hannu=# insert into tabuniq values(null);
INSERT 20560500 1

maybe the problem is with PRIMARY KEY

However, the machinary to decide if an index is usable assumes that
usable operators have two arguments and IS NULL isn't really an
operator in the PostgreSQL sense and doesn't have two arguments either.

*If* that can be fixed, then we can be more flexible. But if it were
easy it would have been done long ago...

sure :)

--
Hannu Krosing <hannu@skype.net>

bruce@momjian.us

over 20 years ago

In reply to: Martijn van Oosterhout (#15)

Re: Does anybody use ORDER BY x USING y?

Martijn van Oosterhout <kleptog@svana.org> writes:

On Sun, Sep 18, 2005 at 11:23:01PM -0400, Tom Lane wrote:
<snip>

class families" to relate opclasses for different datatypes. Basically
I'd like to solve most of these issues by constructing a new layer atop
opclasses, not by deciding that an opclass doesn't convey the full story
about the behavior of an index column.

The thing is that these opclasses you're describing are closely related. It
ought to be possible to use a single index to produce results in any of the
four orders you describe.

Where I'm currently going is creating a table of COLLATE orders. These
collate orders would refer to operator classes but "tweak" them. For
example, things like:

- Sort ascending or descending (descending reverses the bt*cmp test)
- NULLs first or last
- Locale for text types
- etc

These aren't all related in the same way. While it obviously isn't hard to
produce results ascending or descending, and it shouldn't be hard to produce
NULLs first or last regardless of where they appear in the index, it would be
utterly impossible to use an index built with the wrong locale collation.

--
greg

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Bruce Momjian (#19)

Re: Does anybody use ORDER BY x USING y?

Greg Stark <gsstark@mit.edu> writes:

The thing is that these opclasses you're describing are closely related. It
ought to be possible to use a single index to produce results in any of the
four orders you describe.

Wrong --- only two of them. You can't magically swap nulls from one end
of the index to the other (and Hannu's flight of fantasy about double
indexscans is just a flight of fantasy; it would be solving the problem
at entirely the wrong place).

These aren't all related in the same way.

They are all desirable properties of an index column, however. In
particular, we do have a market for genuine reverse-sort columns,
so that you can use a double-column index to get orderings like
ORDER BY x ASC, y DESC.

regards, tom lane

bruce@momjian.us

over 20 years ago

In reply to: Tom Lane (#20)

mark@mark.mielke.cc

over 20 years ago

In reply to: Bruce Momjian (#21)

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Martijn van Oosterhout (#15)