Issue on Varchar Ordering

Started by Rodrigo Hjortalmost 20 years ago5 messages
#1Rodrigo Hjort
rodrigo.hjort@gmail.com

Pg-Hackers,

I'm having the following issue:

create table aluno (id serial, nome varchar(60));

rodrigo=# select * from aluno order by nome offset 35 limit 5;
id | nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 registros)

The problem is that records with id 38 and 40 might come before the others.
It seems that the spacing isn't considered on ordering.
The encoding (LATIN1) is correct according to Brazilian Portuguese, and the
settings are listed below.

rodrigo=# select name, setting from pg_settings where name ~
'encoding|^lc_|version';
name | setting
-----------------+---------
client_encoding | LATIN1
lc_collate | pt_BR
lc_ctype | pt_BR
lc_messages | pt_BR
lc_monetary | pt_BR
lc_numeric | pt_BR
lc_time | pt_BR
server_encoding | LATIN1
server_version | 8.1.3
(9 registros)

The server is a Debian 3.1 GNU/Linux with locales pt_BR.

What could be wrong? Thanks in advance.

--
Regards,

Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br

#2Jonah H. Harris
jonah.harris@gmail.com
In reply to: Rodrigo Hjort (#1)
Re: Issue on Varchar Ordering

I don't think PostgreSQL's sorting it wrong... here's the output of
Linux's sort utility:

edb82@linux:~> cat test.txt
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALCIDES KLIM PERONDI
GABRIELA JACOBY NOS
GABRIELA HELEDA DE SOUZA

edb82@linux:~> cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

edb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgres
id | nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 rows)

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

#3Rodrigo Hjort
rodrigo.hjort@gmail.com
In reply to: Jonah H. Harris (#2)
Re: Issue on Varchar Ordering

Yeah, I think there's a problem on Linux locales.

Using the C locale, it works as expected:

rodrigo@asgard:~$ export LC_ALL=C && cat test.txt | sort
GABRIEL ALCIDES KLIM PERONDI
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIELA LETICIA BATISTA NUNES

But when using Brazilian Portuguese, it gives:

rodrigo@asgard:~$ export LC_ALL=pt_BR && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

rodrigo@asgard:~$ export LC_ALL=pt_BR.UTF-8 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

rodrigo@asgard:~$ export LC_ALL=pt_BR.ISO-8859-1 && cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

The fact is that I can't raise PostgreSQL with C encoding, as special
characters are used on most tables.

Did anyone ever had the same problem on Linux?

2006/3/29, Jonah H. Harris <jonah.harris@gmail.com>:

I don't think PostgreSQL's sorting it wrong... here's the output of
Linux's sort utility:

edb82@linux:~> cat test.txt
GABRIEL ALEXANDRE DA SILVA MANICA
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALCIDES KLIM PERONDI
GABRIELA JACOBY NOS
GABRIELA HELEDA DE SOUZA

edb82@linux:~> cat test.txt | sort
GABRIELA HELEDA DE SOUZA
GABRIELA JACOBY NOS
GABRIEL ALCIDES KLIM PERONDI
GABRIELA LETICIA BATISTA NUNES
GABRIEL ALEXANDRE DA SILVA MANICA

edb82@linux:~> psql -c "SELECT * FROM aluno ORDER BY nome" postgres
id | nome
----+-----------------------------------
36 | GABRIELA HELEDA DE SOUZA
37 | GABRIELA JACOBY NOS
38 | GABRIEL ALCIDES KLIM PERONDI
39 | GABRIELA LETICIA BATISTA NUNES
40 | GABRIEL ALEXANDRE DA SILVA MANICA
(5 rows)

--
Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
732.331.1324

--
Regards,

Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rodrigo Hjort (#3)
Re: Issue on Varchar Ordering

"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:

The fact is that I can't raise PostgreSQL with C encoding, as special
characters are used on most tables.

You're confusing locale and encoding, which are two different (though
interrelated) things.

I suspect the right choice for you will be to use C locale with UTF8
encoding, or possibly one of the LATINn encodings.

regards, tom lane

#5Rodrigo Hjort
rodrigo.hjort@gmail.com
In reply to: Tom Lane (#4)
Re: Issue on Varchar Ordering

Thanks all. I'm gonna try other locales and encodings.

2006/3/29, Tom Lane <tgl@sss.pgh.pa.us>:

"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:

The fact is that I can't raise PostgreSQL with C encoding, as special
characters are used on most tables.

You're confusing locale and encoding, which are two different (though
interrelated) things.

I suspect the right choice for you will be to use C locale with UTF8
encoding, or possibly one of the LATINn encodings.

regards, tom lane

--
Regards,

Rodrigo Hjort
GTI - Projeto PostgreSQL
CELEPAR - Cia de Informática do Paraná
http://www.pr.gov.br