Order by and strings
Hi!
New to the list with a question that I cannot find the answer to in the
manual or on the internet but I suspect is trivial. If somebody could
point me in the correct direction I would be greatful.
This is what I do (condensed, of course):
# create table tmp ( x text ) ;
CREATE TABLE
# insert into tmp(x) values ('a'),('c'),('-b') ;
INSERT 0 3
# select * from tmp order by x ;
x
----
a
-b
c
(3 rows)
I would expect a string that start with a hyphen to be sorted before or
after 'a' and 'c' and not between them. I have tried with a few other
characters (space, opening parenthesis, etc) but the result is the same.
What I want is the strings sorted by their ascii (or UTF-8) values,
without some "smart" heuristic. How do I accomplish this?
I cannot find this described in the manual (it should be there!) or on
the net.
/Fredric
PS. 8.2, Fedora Linux
On 2/8/2010 7:09 PM, Fredric Fredricson wrote:
Hi!
New to the list with a question that I cannot find the answer to in
the manual or on the internet but I suspect is trivial. If somebody
could point me in the correct direction I would be greatful.This is what I do (condensed, of course):
# create table tmp ( x text ) ;
CREATE TABLE
# insert into tmp(x) values ('a'),('c'),('-b') ;
INSERT 0 3
# select * from tmp order by x ;
x
----
a
-b
c
(3 rows)
It has to do with the collation you are using
see
http://www.postgresql.org/docs/8.1/interactive/charset.html
All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored.
CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately.
Thank you.
Justin Graf wrote:
On 2/8/2010 7:09 PM, Fredric Fredricson wrote:
Hi!
New to the list with a question that I cannot find the answer to in
the manual or on the internet but I suspect is trivial. If somebody
could point me in the correct direction I would be greatful.This is what I do (condensed, of course):
# create table tmp ( x text ) ;
CREATE TABLE
# insert into tmp(x) values ('a'),('c'),('-b') ;
INSERT 0 3
# select * from tmp order by x ;
x
----
a
-b
c
(3 rows)It has to do with the collation you are using
I use locale en_US.UTF-8 but why this should affect how leading
characters in strings are ignored is beyond me.
Another, in my mind, counter-intuitive example:
# create table tmp ( x text ) ;
# insert into tmp(x) (values
('a'),('-b'),('c'),('aa'),('---a-b'),('ac'),('1'),('-2'),('+3'),('4'),('-'),('+'))
;
# select * from tmp order by x ;
x
--------
-
+
1
-2
+3
4
a
aa
---a-b
ac
-b
c
(12 rows)
In what universe would you expect this sort order? And how to make it
'sane'?
I found a work-around, "order by ascii(x),x", but this continues to
baffle me.
It seems to me that if there are any alphanumeric characters in the
string the rest are ignored in the sort.
Where did this rule come from?
I really would appreciate an explanation for this behavior.
/Fredric
PS. I was wrong about the server version, it is 8.3.8.
On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:
Justin Graf wrote:
On 2/8/2010 7:09 PM, Fredric Fredricson wrote:
Hi!
New to the list with a question that I cannot find the answer to in the
manual or on the internet but I suspect is trivial. If somebody could point
me in the correct direction I would be greatful.
It seems you're seeking ASCII or C locale sorting.
# select * from tmp order by x ;
x
--------
-
+
1
-2
+3
4
a
aa
---a-b
ac
-b
c
(12 rows)In what universe would you expect this sort order? And how to make it
'sane'?
In a library perhaps?
I found a work-around, "order by ascii(x),x", but this continues to baffle
me.
It's quite simple. en_US locale, and others like it sort by ignoring
things like white space and noise characters so that only letters and
numbers count, and things like ñ sort right near n, not at the end or
beginning of the table.
It seems to me that if there are any alphanumeric characters in the string
the rest are ignored in the sort.
Where did this rule come from?
I really would appreciate an explanation for this behavior.
It's been around quite some time. I'm afraid I'll have to defer to
some other expert on the exact history.
On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:
I use locale en_US.UTF-8 but why this should affect how leading characters
in strings are ignored is beyond me.
P.s. this page may shed some light on the subject:
Scott Marlowe wrote:
On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:I use locale en_US.UTF-8 but why this should affect how leading characters
in strings are ignored is beyond me.P.s. this page may shed some light on the subject:
OK, thanks. It did shed some light on the subject. Only I wonder what
would happen if these sort algorithms where used on things like article
numbers in the industry. That would confuse the hell out of the people
there.
Alas, not my problem. I have a work around that works for my little part
of the universe. At least for now.
You live, you learn.
/Fredric
On Tue, Feb 9, 2010 at 11:21, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:
Scott Marlowe wrote:
On Tue, Feb 9, 2010 at 1:42 AM, Fredric Fredricson
<Fredric.Fredricson@bonetmail.com> wrote:I use locale en_US.UTF-8 but why this should affect how leading characters
in strings are ignored is beyond me.P.s. this page may shed some light on the subject:
http://en.wikipedia.org/wiki/Collation
OK, thanks. It did shed some light on the subject. Only I wonder what would
happen if these sort algorithms where used on things like article numbers in
the industry. That would confuse the hell out of the people there.
In case you are storing something like that, you may be better off
using bytea instead of text.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/