sorting problem

Started by Jamie Deppelerover 21 years ago16 messagesgeneral
Jump to latest
#1Jamie Deppeler
jamie@doitonce.net.au

Problem i am having at the moment i cant get a true alpha sort to work
as Order By is sorting A..Z then a..z where i need aA..zZ sort
independant of case.

SQL Query

SELECT
*
FROM
person
WHERE
(salutation LIKE '%To%')
ORDER BY
person.lastname

Results

Ahsteit
Bloggs
Cap
Carrey
Diver
Duckula
Goldsworthy
Gruff
Harmony
Hassleberry-flop-flop
Heyheagle
Jahoosal
Straindove
Yorrick
of Finchery

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Jamie Deppeler (#1)
Re: sorting problem

Jamie Deppeler wrote:

Problem i am having at the moment i cant get a true alpha sort to
work as Order By is sorting A..Z then a..z where i need aA..zZ sort
independant of case.

Initialize the database cluster with a locale setting other than "C".

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#3Michael Fuhr
mike@fuhr.org
In reply to: Jamie Deppeler (#1)
Re: sorting problem

On Fri, Dec 17, 2004 at 11:28:36AM +1100, Jamie Deppeler wrote:

Problem i am having at the moment i cant get a true alpha sort to work
as Order By is sorting A..Z then a..z where i need aA..zZ sort
independant of case.

ORDER BY LOWER(person.lastname)

or

ORDER BY UPPER(person.lastname)

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#4Michael Fuhr
mike@fuhr.org
In reply to: Peter Eisentraut (#2)
Re: sorting problem

On Fri, Dec 17, 2004 at 01:45:36AM +0100, Peter Eisentraut wrote:

Jamie Deppeler wrote:

Problem i am having at the moment i cant get a true alpha sort to
work as Order By is sorting A..Z then a..z where i need aA..zZ sort
independant of case.

Initialize the database cluster with a locale setting other than "C".

Hmmm...did I misunderstand something when I recommended using
ORDER BY LOWER(person.lastname)?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

#5Chris Smith
chris@interspire.com
In reply to: Michael Fuhr (#3)
Re: sorting problem

Would doing it this way require an index:

create index lower_lastname on table x lower(lastname);

?

Regards,

Chris Smith

Suite 30, 45-51 Huntley St, Alexandria, NSW 2015 Australia

Ph: +61 2 9517 2505
Fx: +61 2 9517 1915

email: info@interspire.com
web: www.interspire.com

Michael Fuhr wrote:

On Fri, Dec 17, 2004 at 11:28:36AM +1100, Jamie Deppeler wrote:

Problem i am having at the moment i cant get a true alpha sort to work
as Order By is sorting A..Z then a..z where i need aA..zZ sort
independant of case.

ORDER BY LOWER(person.lastname)

or

ORDER BY UPPER(person.lastname)

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.296 / Virus Database: 265.5.4 - Release Date: 12/15/2004

#6Bruce Momjian
bruce@momjian.us
In reply to: Chris Smith (#5)
Re: sorting problem

Chris Smith <chris@interspire.com> writes:

Would doing it this way require an index:

create index lower_lastname on table x lower(lastname);

Well it doesn't *require* but it may be a good idea. It depends on your
queries. It will NOT be useful for a query like:

select * from x order by lower(lastname)

where postgres won't bother with the index since it will be slower than just
resorting the entire table. The way this index is useful is if you have
queries of the form:

select * from x where lower(lastname) between ? and ? order by lower(lastname)

or

select * from x order by lower(lastname) offset ? limit ?

Though this will eventually switch to sorting when the offset is large.
Better is to use something like:

select * from x where lower(lastname) > ? order by lower(lastname) limit ?

or perhaps something like this if a merge join with fast start is useful:

select * from x join y on (x.lower(lastname)=y.lower(lastname))

But

--
greg

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Fuhr (#4)
Re: sorting problem

Am Freitag, 17. Dezember 2004 01:59 schrieb Michael Fuhr:

Initialize the database cluster with a locale setting other than "C".

Hmmm...did I misunderstand something when I recommended using
ORDER BY LOWER(person.lastname)?

Those are two different ways to achieve the same effect in this particular
case.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#8Bruno Wolff III
bruno@wolff.to
In reply to: Bruce Momjian (#6)
Re: sorting problem

On Thu, Dec 16, 2004 at 23:33:00 -0500,
Greg Stark <gsstark@mit.edu> wrote:

Chris Smith <chris@interspire.com> writes:

Would doing it this way require an index:

create index lower_lastname on table x lower(lastname);

Well it doesn't *require* but it may be a good idea. It depends on your
queries. It will NOT be useful for a query like:

select * from x order by lower(lastname)

where postgres won't bother with the index since it will be slower than just
resorting the entire table. The way this index is useful is if you have
queries of the form:

Using an index to do an order by is an order N operation. Doing a sort
is an order N log N operation. For large values of N, a index will be
faster. The index will slow down write operations, so it may still be
a bad idea in some cases.

#9Bruce Momjian
bruce@momjian.us
In reply to: Bruno Wolff III (#8)
Re: sorting problem

Bruno Wolff III <bruno@wolff.to> writes:

Using an index to do an order by is an order N operation.

No, using an index to do an order by is actually still n*log(n). You have to
traverse all the parent pages in the binary tree of the index as well.

This only goes to show how small the log(n) component of the order is. It's
easily dwarfed by large constants such as the difference in i/o efficiency
from non-contiguous reads.

--
greg

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#8)
Re: sorting problem

Bruno Wolff III <bruno@wolff.to> writes:

Greg Stark <gsstark@mit.edu> wrote:

where postgres won't bother with the index since it will be slower than just
resorting the entire table.

Using an index to do an order by is an order N operation. Doing a sort
is an order N log N operation. For large values of N, a index will be
faster.

Unfortunately not ... the constant factors are such that the index
solution isn't very competitive at large N, unless the table is already
well ordered (ie clustered). The sort code is a lot better at avoiding
random-seeks-all-over-the-disk syndrome.

I suppose your argument is good as N goes to infinity, but for
real-world cases we don't seem to reach the asymptotic regime.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: sorting problem

Greg Stark <gsstark@mit.edu> writes:

Bruno Wolff III <bruno@wolff.to> writes:

Using an index to do an order by is an order N operation.

No, using an index to do an order by is actually still n*log(n). You have to
traverse all the parent pages in the binary tree of the index as well.

Only if you searched afresh from the root for each key, which an
indexscan is not going to do.

regards, tom lane

#12Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Tom Lane (#10)
Re: sorting problem

--=======6C297FEF=======
Content-Type: text/plain; x-avg-checked=avg-ok-43C64EFF; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit

At 12:14 PM 12/17/2004 -0500, Tom Lane wrote:

Bruno Wolff III <bruno@wolff.to> writes:

Greg Stark <gsstark@mit.edu> wrote:

where postgres won't bother with the index since it will be slower

than just

resorting the entire table.

Using an index to do an order by is an order N operation. Doing a sort
is an order N log N operation. For large values of N, a index will be
faster.

Unfortunately not ... the constant factors are such that the index
solution isn't very competitive at large N, unless the table is already
well ordered (ie clustered). The sort code is a lot better at avoiding
random-seeks-all-over-the-disk syndrome.

Which would involve reading less data?

In some cases I'd like to use the method that is more likely to fit within RAM.

Also if there are multiple parallel queries, one could end up with
something similar to random-seeks, so reading/writing less data could still
be better.

Regards,
Link.

--=======6C297FEF=======--

#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: sorting problem

Tom Lane <tgl@sss.pgh.pa.us> writes:

Greg Stark <gsstark@mit.edu> writes:

Bruno Wolff III <bruno@wolff.to> writes:

Using an index to do an order by is an order N operation.

No, using an index to do an order by is actually still n*log(n). You have to
traverse all the parent pages in the binary tree of the index as well.

Only if you searched afresh from the root for each key, which an
indexscan is not going to do.

Isn't that still nlog(n)? In the end you're going to have read in every page
of the index including all those non-leaf pages. Aren't there nlog(n) pages?

--
greg

#14Bruno Wolff III
bruno@wolff.to
In reply to: Bruce Momjian (#13)
Re: sorting problem

On Fri, Dec 17, 2004 at 15:36:58 -0500,
Greg Stark <gsstark@mit.edu> wrote:

Isn't that still nlog(n)? In the end you're going to have read in every page
of the index including all those non-leaf pages. Aren't there nlog(n) pages?

The depth of the tree is log N, but there are only N nodes. I think you
can treat the amount of information at each node as constant.

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruno Wolff III (#14)
Re: sorting problem

Bruno Wolff III <bruno@wolff.to> writes:

Greg Stark <gsstark@mit.edu> wrote:

Isn't that still nlog(n)? In the end you're going to have read in every page
of the index including all those non-leaf pages. Aren't there nlog(n) pages?

The depth of the tree is log N, but there are only N nodes. I think you
can treat the amount of information at each node as constant.

Besides, we don't read any non-leaf pages except the ones down the left
edge of the btree. An indexscan just visits the leaf pages using their
sibling links to get from one to the next. (If it did otherwise, we'd
not produce the output in sorted order, which would make this discussion
moot...)

regards, tom lane

#16Bruno Wolff III
bruno@wolff.to
In reply to: Bruno Wolff III (#14)
Re: sorting problem

On Fri, Dec 17, 2004 at 15:12:18 -0600,
Bruno Wolff III <bruno@wolff.to> wrote:

On Fri, Dec 17, 2004 at 15:36:58 -0500,
Greg Stark <gsstark@mit.edu> wrote:

Isn't that still nlog(n)? In the end you're going to have read in every page
of the index including all those non-leaf pages. Aren't there nlog(n) pages?

The depth of the tree is log N, but there are only N nodes. I think you
can treat the amount of information at each node as constant.

That should have been 2N-1 nodes.