Confused with db client encoding

Started by Carlos Correiaover 21 years ago2 messagesgeneral

carlos@m16e.com

over 21 years ago

Hi,

Here is the output a psql session. Please notice that the identation
inconsistences in the records containg non ASCII chars is as outputed by
psql.

The db was created with LANIN9 and the console was ran (in the same
machine) using UTF-8 (my system's default).

I was surprised to notice that setting the client to unicode (which is
what that console is using) messed the localized chars as I was
expecting to see the opposite way.

On the other way, when invoking from a Java app, running on the same
machine, the accentuaded chars also appeared messed.

Have I misunderstood the manual? How can I get a consistant behaviour?

It was tested in a Debian/unstable box, running PostgreSQL 7.4.5-3 and
Sun's JVM 1.4.2

Thanks,

Carlos

Ian Lawrence Barwick

barwick@gmail.com

over 21 years ago

In reply to: Carlos Correia (#1)

Re: Confused with db client encoding

On Mon, 06 Sep 2004 00:02:24 +0100, Carlos Correia <carlos@m16e.com> wrote:

Hi,

Here is the output a psql session. Please notice that the identation
inconsistences in the records containg non ASCII chars is as outputed by
psql.

The db was created with LANIN9 and the console was ran (in the same
machine) using UTF-8 (my system's default).

I was surprised to notice that setting the client to unicode (which is
what that console is using) messed the localized chars as I was
expecting to see the opposite way.

On the other way, when invoking from a Java app, running on the same
machine, the accentuaded chars also appeared messed.

(...)

3 | Tx. Dinheiro | TransacÃ§Ãµes a Dinheiro
11 | Nota de CrÃ©dito | Notas de CrÃ©dito
12 | Nota de DÃ©bito | Notas de DÃ©bito
21 | G. Remessa | Guia de Remessa

It looks like this data was entered as UTF-8 but the client encoding
was LATIN9 (or whatever), meaning the two incoming bytes from each
accentuated character in UTF-8 was interpreted by the backend as two
individual bytes in LATINx.

Test case (session in a UTF-8 environment):

test=# CREATE DATABASE ctest encoding 'LATIN1';
CREATE DATABASE
test=# \c ctest;
You are now connected to database "ctest".
ctest=# CREATE TABLE coding (data TEXT);
CREATE TABLE
ctest=# SET client_encoding TO LATIN1;
SET
ctest=# INSERT INTO coding VALUES('müller');
INSERT 349960 1
ctest=# SELECT * FROM coding;
data
---------
müller
(1 row)

ctest=# SET client_encoding TO UNICODE;
SET
ctest=# SELECT * FROM coding;
data
---------
mÃ¼ller
(1 row)

Ian Barwick