Confused with db client encoding
Hi,
Here is the output a psql session. Please notice that the identation
inconsistences in the records containg non ASCII chars is as outputed by
psql.
The db was created with LANIN9 and the console was ran (in the same
machine) using UTF-8 (my system's default).
I was surprised to notice that setting the client to unicode (which is
what that console is using) messed the localized chars as I was
expecting to see the opposite way.
On the other way, when invoking from a Java app, running on the same
machine, the accentuaded chars also appeared messed.
Have I misunderstood the manual? How can I get a consistant behaviour?
It was tested in a Debian/unstable box, running PostgreSQL 7.4.5-3 and
Sun's JVM 1.4.2
Thanks,
Carlos
psql session:
-----
mpb2-m16e=# \l
List of databases
Name | Owner | Encoding
-----------+----------+----------
mpb2-test | carlos | LATIN9
template0 | postgres | LATIN9
template1 | postgres | LATIN9
(3 rows)
mpb2-m16e=# select tipo_doc_id, nome, descricao from tab_tipo_doc where
tipo_doc_id < 100;
tipo_doc_id | nome | descricao
-------------+----------------------+---------------------------------------
0 | | (documento desconhecido)
1 | Encomenda | Encomendas
2 | Factura | Facturas
3 | Tx. Dinheiro | Transacções a Dinheiro
11 | Nota de Crédito | Notas de Crédito
12 | Nota de Débito | Notas de Débito
21 | G. Remessa | Guia de Remessa
91 | Saída Armazém | Saídas de Armazém
92 | Ent. Armazém | Entradas em Armazém
5 | Devolução | Devoluções de Facturas/Tx. Dinheiro
99 | Acerto Inv. | Acerto de Inventário
51 | O.T. | Ordens de Trabalho
(12 rows)
mpb2-m16e=# set client_encoding to unicode;
SET
mpb2-m16e=# select tipo_doc_id, nome, descricao from tab_tipo_doc where
tipo_doc_id < 100;
tipo_doc_id | nome | descricao
-------------+----------------------+---------------------------------------
0 | | (documento desconhecido)
1 | Encomenda | Encomendas
2 | Factura | Facturas
3 | Tx. Dinheiro | Transacções a Dinheiro
11 | Nota de Crédito | Notas de Crédito
12 | Nota de Débito | Notas de Débito
21 | G. Remessa | Guia de Remessa
91 | SaÃda Armazém | SaÃdas de Armazém
92 | Ent. Armazém | Entradas em Armazém
5 | Devolução | Devoluções de Facturas/Tx.
Dinheiro
99 | Acerto Inv. | Acerto de Inventário
51 | O.T. | Ordens de Trabalho
(12 rows)
On Mon, 06 Sep 2004 00:02:24 +0100, Carlos Correia <carlos@m16e.com> wrote:
Hi,
Here is the output a psql session. Please notice that the identation
inconsistences in the records containg non ASCII chars is as outputed by
psql.The db was created with LANIN9 and the console was ran (in the same
machine) using UTF-8 (my system's default).I was surprised to notice that setting the client to unicode (which is
what that console is using) messed the localized chars as I was
expecting to see the opposite way.On the other way, when invoking from a Java app, running on the same
machine, the accentuaded chars also appeared messed.
(...)
3 | Tx. Dinheiro | Transacções a Dinheiro
11 | Nota de Crédito | Notas de Crédito
12 | Nota de Débito | Notas de Débito
21 | G. Remessa | Guia de Remessa
It looks like this data was entered as UTF-8 but the client encoding
was LATIN9 (or whatever), meaning the two incoming bytes from each
accentuated character in UTF-8 was interpreted by the backend as two
individual bytes in LATINx.
Test case (session in a UTF-8 environment):
test=# CREATE DATABASE ctest encoding 'LATIN1';
CREATE DATABASE
test=# \c ctest;
You are now connected to database "ctest".
ctest=# CREATE TABLE coding (data TEXT);
CREATE TABLE
ctest=# SET client_encoding TO LATIN1;
SET
ctest=# INSERT INTO coding VALUES('müller');
INSERT 349960 1
ctest=# SELECT * FROM coding;
data
---------
müller
(1 row)
ctest=# SET client_encoding TO UNICODE;
SET
ctest=# SELECT * FROM coding;
data
---------
müller
(1 row)
Ian Barwick