Issue on Varchar Ordering
Pg-Hackers,
I'm having the following issue:
create table aluno (id serial, nome varchar(60));
rodrigo=# select * from aluno order by nome offset 35 limit 5;
id | nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 registros)
The problem is that records with id 38 and 40 might come before the others.
It seems that the spacing isn't considered on ordering.
The encoding (LATIN1) is correct according to Brazilian Portuguese, and the
settings are listed below.
rodrigo=# select name, setting from pg_settings where name ~
'encoding|^lc_|version';
name | setting
-----------------+---------
client_encoding | LATIN1
lc_collate | pt_BR
lc_ctype | pt_BR
lc_messages | pt_BR
lc_monetary | pt_BR
lc_numeric | pt_BR
lc_time | pt_BR
server_encoding | LATIN1
server_version | 8.1.3
(9 registros)
The server is a Debian 3.1 GNU/Linux with locales pt_BR.
What could be wrong? Thanks in advance.
--
Regards,
Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br
I don't think PostgreSQL's sorting it wrong... here's the output of
Linux's sort utility:
edb82@linux:~> cat test.txt
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALCIDES KLIM PERONDI
GABRIELA JACOBY NOS
GABRIELA HELEDA DE SOUZA
edb82@linux:~> cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
edb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgres
id | nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 rows)
--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324
Yeah, I think there's a problem on Linux locales.
Using the C locale, it works as expected:
rodrigo@asgard:~$ export LC_ALL=C && cat test.txt | sort
GABRIEL ALCIDES KLIM PERONDI
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIELA LETICIA BATISTA NUNES
But when using Brazilian Portuguese, it gives:
rodrigo@asgard:~$ export LC_ALL=pt_BR && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
rodrigo@asgard:~$ export LC_ALL=pt_BR.UTF-8 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
rodrigo@asgard:~$ export LC_ALL=pt_BR.ISO-8859-1 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA
The fact is that I can't raise PostgreSQL with C encoding, as special
characters are used on most tables.
Did anyone ever had the same problem on Linux?
2006/3/29, Jonah H. Harris <jonah.harris@gmail.com>:
I don't think PostgreSQL's sorting it wrong... here's the output of
Linux's sort utility:edb82@linux:~> cat test.txt
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALCIDES KLIM PERONDI
GABRIELA JACOBY NOS
GABRIELA HELEDA DE SOUZAedb82@linux:~> cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICAedb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgres
id | nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 rows)--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324
--
Regards,
Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
The fact is that I can't raise PostgreSQL with C encoding, as special
characters are used on most tables.
You're confusing locale and encoding, which are two different (though
interrelated) things.
I suspect the right choice for you will be to use C locale with UTF8
encoding, or possibly one of the LATINn encodings.
regards, tom lane
Thanks all. I'm gonna try other locales and encodings.
2006/3/29, Tom Lane <tgl@sss.pgh.pa.us>:
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
The fact is that I can't raise PostgreSQL with C encoding, as special
characters are used on most tables.You're confusing locale and encoding, which are two different (though
interrelated) things.I suspect the right choice for you will be to use C locale with UTF8
encoding, or possibly one of the LATINn encodings.regards, tom lane
--
Regards,
Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br