Is it bad sorting in UTF ??
Why PG sort's my data in case insensitive manner?
masterdb=# SELECT name FROM enterprises ORDER BY name;
name
--------------------------------------
................
abc
AKS 514
aks518
AKSFree11111
..........................
The worst (totally mess) example is:
masterdb=# SELECT name, replace(name, ' ', '<20>') FROM enterprises
ORDER BY name;
name | replace
--------------------------------------
+--------------------------------------------------------
...............................
alinatestfree | alinatestfree
Alina Test Free | Alina<20>Test<20>Free
alinatestfree220 | alinatestfree220
...............................
More info here:
masterdb=# select version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 8.3.7 on x86_64-redhat-linux-gnu, compiled by GCC gcc
(GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)
(1 row)
masterdb=# select name, setting from pg_settings WHERE name ilike 'lc
%';
name | setting
-------------+-------------
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
(6 rows)
masterdb=# \l
List of databases
Name | Owner | Encoding
------------------------------------+----------+----------
masterdb | postgres | UTF8
...............
masterdb=# \d+ enterprises
Table "public.enterprises"
Column | Type | Modifiers |
Description
--------------------+--------------------------+-----------
+-------------
...............
name | text | |
...................
masterdb=# EXPLAIN ANALYZE SELECT name FROM enterprises ORDER BY name;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=29.62..30.68 rows=422 width=11) (actual time=1.537..1.596
rows=367 loops=1)
Sort Key: name
Sort Method: quicksort Memory: 44kB
-> Seq Scan on enterprises (cost=0.00..11.22 rows=422 width=11)
(actual time=0.012..0.181 rows=367 loops=1)
Total runtime: 1.677 ms
(5 rows)
wstrzalka wrote:
Why PG sort's my data in case insensitive manner?
masterdb=# select name, setting from pg_settings WHERE name ilike 'lc
%';
name | setting
-------------+-------------
lc_collate | en_US.UTF-8
Because that's what en_US.UTF-8 does. If you want "computer-style"
sorting choose the "C" locale. You'll need to dump all your databases an
re-initdb to do that I'm afraid. I believe 8.4 is going to allow
different locales for each database though.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
wstrzalka wrote:
Why PG sort's my data in case insensitive manner?
masterdb=# select name, setting from pg_settings WHERE name ilike 'lc
%';
name | setting
-------------+-------------
lc_collate | en_US.UTF-8Because that's what en_US.UTF-8 does. If you want "computer-style"
sorting choose the "C" locale. You'll need to dump all your databases an
re-initdb to do that I'm afraid. I believe 8.4 is going to allow
different locales for each database though.
Even if it were slower, having a way to specify that sorting be WITH a
particular collation method/locale would be appealing.
I guess this could be done with a custom operator/opclass implementing
non-libc-based sorting and collation, which is probably where PostgreSQL
will land up heading in the end.
--
Craig Ringer
At 07:57 PM 4/14/2009, Richard Huxton wrote:
wstrzalka wrote:
Why PG sort's my data in case insensitive manner?
masterdb=# select name, setting from pg_settings WHERE name ilike 'lc
%';
name | setting
-------------+-------------
lc_collate | en_US.UTF-8Because that's what en_US.UTF-8 does. If you want "computer-style"
sorting choose the "C" locale. You'll need to dump all your
databases an re-initdb to do that I'm afraid. I believe 8.4 is going
to allow different locales for each database though.
Is it possible to have something like:
select name from sometable order by collate(name,'en_US.UTF-8');
select name from sometable order by collate(name,'C');
Link.