order by question

Started by Steve Clarkover 11 years ago5 messagesgeneral
Jump to latest
#1Steve Clark
sclark@netwolves.com

Hello,

I am confused by how postgres 8,4..13 is sorting my data.

\d test
Table "public.test"
Column | Type | Modifiers
--------+------+-----------
data | text |

select * from test order by data;
data
----------

-
--
1
11
11F
1F
a
b
C
F
-F
Feneric
Generic
(14 rows)

The first row is a single space, the next row a single -, the next two -- .
What I don't understand is why the '-F', the ' Feneric' and the ' Generic' sort where they do.

I would expect the output to be like this:

data
----------

Feneric
Generic
-
--
-F
1
11
11F
1F
a
b
C
F
(14 rows)

client_encoding
-----------------
SQL_ASCII

lc_collate
-------------
en_US.UTF-8

foxboxconfig=# show lc_ctype;
lc_ctype
-------------
en_US.UTF-8

Thanks for any clarification.

--
Stephen Clark

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Steve Clark (#1)
Re: order by question

Steve Clark <sclark@netwolves.com> wrote:

I am confused by how postgres 8,4..13 is sorting my data.

select * from test order by data;
     data
----------

   -
   --
   1
   11
   11F
   1F
   a
   b
   C
   F
   -F
   Feneric
   Generic
(14 rows)

The first row is a single space, the next row a single -, the next two -- .
What I don't understand is why the '-F', the ' Feneric' and
the ' Generic' sort where they do.

I would expect the output to be like this:

     data
----------

   Feneric
   Generic
   -
   --
   -F
   1
   11
   11F
   1F
   a
   b
   C
   F
(14 rows)

   lc_collate
-------------
   en_US.UTF-8

PostgreSQL uses the OS collations.  What you are getting matches my
Ubuntu 14.04 machine:

kgrittn@Kevin-Desktop:~/pg/master$ echo $LANG
en_US.UTF-8
kgrittn@Kevin-Desktop:~/pg/master$ sort <<XXX

   Feneric
   Generic
   -
   --
   -F
   1
   11
   11F
   1F
   a
   b
   C
   F
XXX

  -
  --
  1
  11
  11F
  1F
  a
  b
  C
  F
  -F
  Feneric
  Generic

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Steve Clark (#1)
Re: order by question

On 08/07/2014 08:53 AM, Steve Clark wrote:

On 08/07/2014 11:36 AM, Kevin Grittner wrote:

Steve Clark <sclark@netwolves.com> wrote:

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi Kevin,

Thanks for the response. I get the same thing as postgres if I sort from
the command line too. But I don't understand why.

Because that is how en_US.UTF-8 sorts. You are probably looking for the
'C' sort order. For more explanations see:

http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html

http://superuser.com/questions/227925/in-utf-8-collation-why-11-is-less-then-1

I would expect '-F' to sort immediately after '-' and '--' not after 'F' as
well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort
immediately after ' ' (space).

It is like the space character and the - in -Letter is ignored.

See the second link above for why that is.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Steve Clark (#1)
Re: order by question

Steve Clark <sclark@netwolves.com> wrote:

It is like the space character and the - in -Letter is ignored.

Yes, that is how the en_US collation is defined.  I think the goal
is to make it something like "phone book" ordering.  If you still
have a "white pages" book around, look at how a business name with
a hyphen (or a hyphenated last name) is sorted there.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Steve Clark
sclark@netwolves.com
In reply to: Adrian Klaver (#3)
Re: order by question

On 08/07/2014 12:10 PM, Adrian Klaver wrote:

On 08/07/2014 08:53 AM, Steve Clark wrote:

On 08/07/2014 11:36 AM, Kevin Grittner wrote:

Steve Clark <sclark@netwolves.com> wrote:

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi Kevin,

Thanks for the response. I get the same thing as postgres if I sort from
the command line too. But I don't understand why.

Because that is how en_US.UTF-8 sorts. You are probably looking for the
'C' sort order. For more explanations see:

http://rhaas.blogspot.com/2012/03/perils-of-collation-aware-comparisons.html

http://superuser.com/questions/227925/in-utf-8-collation-why-11-is-less-then-1

I would expect '-F' to sort immediately after '-' and '--' not after 'F' as
well as ' Feneric' (spaceFeneric) and ' Generic' (spaceGeneric) to sort
immediately after ' ' (space).

It is like the space character and the - in -Letter is ignored.

See the second link above for why that is.

Thanks Adrian and Kevin,

for the explanations and links. They were very elucidating. At least now I understand what is happening.

Thanks again.

--
Stephen Clark

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general