BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem

Started by Stanislav Sukholetover 20 years ago3 messagesbugs
Jump to latest
#1Stanislav Sukholet
ctac113@mail.ru

The following bug has been logged online:

Bug reference: 1976
Logged by: Stanislav Sukholet
Email address: ctac113@mail.ru
PostgreSQL version: 7.4.8.1.FC3.1
Operating system: 2.6.12-1.1378_FC3
Description: steps to reproduce BUG #1438: Non UTF-8 client encoding
problem
Details:

That was really easy to reproduce:
$ export LANG=ru_RU.koi8r
$ createdb -E UNICODE mydb
$ psql -d mydb
mydb=# \encoding KOI8
mydb=# create table a (aa integer);
CREATE TABLE
mydb=# create table b (bb integer primary key);
ERROR: ignoring unconvertible UTF-8 character 0xd3cf
mydb=# \d
Список связей
Схема | Имя | Тип | Владелец
--------+-----+---------+----------
public | a | таблица | postgres
(1 запись)

mydb=#

So, it's always a problem when I put PRIMARY KEY modifier after column
declaration with KOI8 encoding.
I've put this report to bugzilla@redhat:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171174

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stanislav Sukholet (#1)
Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem

"Stanislav Sukholet" <ctac113@mail.ru> writes:

mydb=# create table b (bb integer primary key);
ERROR: ignoring unconvertible UTF-8 character 0xd3cf

Can't reproduce this here. What locale settings are you using in the
database? (Particularly lc_ctype and lc_messages)

regards, tom lane

#3Bill Shui
bill.shui@gmail.com
In reply to: Stanislav Sukholet (#1)
Re: BUG #1976: steps to reproduce BUG #1438: Non UTF-8 client encoding problem

Hi,

I have the following scenario. I have two boxes (1 windows server 2003
and 1 linux debian sarge).

The debian box runs the PostgreSQL server and the windows box is using
Chinese character set.

If I want to building an application on windows (through ODBC), should
I connect to the server with client encoding set to EUC_CN or UNICODE?

On the server side, shoudl I initdb -E using EUC_CN or UNICODE?

Also, with the locale setting.
Shoudl I set --locale=zh_ZN.UTF-8?

Thanks.
Bill

On 20/10/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stanislav Sukholet <ctac@osib.so-cdu.ru> writes:

Can't reproduce this here. What locale settings are you using in the
database? (Particularly lc_ctype and lc_messages)

mydb=> SHOW client_encoding ;
client_encoding
-----------------
KOI8
(1 запись)

mydb=> show LC_CTYPE;
lc_ctype
-------------
ru_RU.koi8r
(1 запись)

mydb=> show LC_MESSAGES;
lc_messages
-------------
ru_RU.koi8r
(1 запись)

mydb=> CREATE TABLE a (b INTEGER PRIMARY KEY);
ERROR: ignoring unconvertible UTF-8 character 0xd3cf

OK, with that I can reproduce it in 7.4, but more recent releases
produce a bunch of "WARNING: ignoring unconvertible UTF-8 character"
notices and then complete the operation successfully.

This is basically the same problem discussed in this thread:
http://archives.postgresql.org/pgsql-patches/2005-08/msg00037.php
namely that gettext() converts the translated error message to the
encoding implied by LC_CTYPE ... but the error reporting machinery
expects the string to be in the encoding specified for the database.

I have applied a minor tweak to the 7.4 branch to make it behave more
like the later releases, ie you get a WARNING not an ERROR. However
this is certainly not really a solution --- the only reason the behavior
isn't worse is that the ru_RU message catalog doesn't try to translate
"ignoring unconvertible UTF-8 character" and so you don't get into the
recursive failure discussed in the above thread.

The bottom line is that this is one of several reasons why it's a bad
idea to use a database encoding that's incompatible with the underlying
locale settings. I doubt that we'll really be able to fix that until
we replace all our dependence on the C library's locale facilities
... which is something that will probably happen someday, but don't
hold your breath waiting :-(

In short, if you want to use UTF8 database encoding, specify a
UTF8-based locale setting when you initdb. Don't try to change
the database encoding via -E.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Persistence is the twin sister of excellence. One is a matter of
quality; the other, a matter of time.
Marabel Morgan, The Electric Woman