Unicode support problem

Started by Jatinder Sanghaover 21 years ago4 messagesgeneral

js@coalitiondev.com

over 21 years ago

Hi all,

I'm having a problem with unicode support in postgres under linux.

The issue is that I am copying lots of data from an MS SQL Server
database via java/jdbc running on Windows XP over to a postgres database
running on linux. I've setup the postgres database as follows:

LANG=C
initdb -E UNICODE
createdb -E UNICODE

And then when I'm transferring the data, when my program tries to send a
string containing a character 0xF6 (Latin small letter o with
diaeresis), then I get a JDBC exception & an error log on the server as
follows:
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.

I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all kinds
of things, and nothing seems to fix it.

If I setup the database as follows:
LANG=C
initdb -E iso8859_1
createdb -E iso8859_1

Then it appears to work OK - but I then get an error with character 0xE2
(Latin small letter a with circumflex):
ERROR: could not convert UTF-8 character 0x00e2 to ISO8859-1

Does anyone know how to do correctly do this?

This is my environment:
LINUX: DEBIAN 3.0, KERNEL 2.4 running on a 2CPU PC.
Postgres: 8.0.1 built from source, no changes to anything, running on
the linux box.
JDBC driver: postgresql-8.0-310.jdbc3.jar
Java JVM (Sun) 1.4.2_02 on Windows XP SP2.

If I run postgres on the Windows XP machine (configured for UNICODE as
above), then I don't have any errors at all. This only happens on the
linux box.

Any help in fixing this would be greatly appreciated.
Thanks,
--Jatinder Sangha
Coalition Development

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Jatinder Sangha (#1)

Re: Unicode support problem

"Jatinder Sangha" <js@coalitiondev.com> writes:

I've setup the postgres database as follows:

LANG=C
initdb -E UNICODE
createdb -E UNICODE

I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all kinds
of things, and nothing seems to fix it.

You can't just pick random combinations of locale and database encoding.
Any given locale setting implies a character set encoding, and you have
to use that same encoding as the database encoding; at least if you want
encoding-dependent operations such as upper()/lower() to work. The
locale you want for Unicode (UTF8) may be named something like
"en_US.utf8". Try "locale -a" to get a list of supported locales.

regards, tom lane

Tatsuo Ishii

ishii@postgresql.org

over 21 years ago

In reply to: Jatinder Sangha (#1)

Re: Unicode support problem

If I setup the database as follows:
LANG=C
initdb -E iso8859_1
createdb -E iso8859_1

Then it appears to work OK - but I then get an error with character 0xE2
(Latin small letter a with circumflex):
ERROR: could not convert UTF-8 character 0x00e2 to ISO8859-1

The error message says all. You are trying to convert an UTF-8
character starting with 0x00e2 to ISO-8859-1, which does not exist in
the world. All ISO-8859-1 chars in UTF-8 are below 0x00e0 range.
Probably you mixed up with ISO-8859-2 or any other characters other
than ISO-8859-1?
--
Tatsuo Ishii

Jatinder Sangha

js@coalitiondev.com

over 21 years ago

In reply to: Tatsuo Ishii (#3)

Re: Unicode support problem

Hi Tom,

Thanks for the reply -- yes, creating the en_US.utf8 locale and using
that, fixed all of my problems.

Thanks,
--Jatinder

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 24 February 2005 17:11
To: Jatinder Sangha
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unicode support problem

"Jatinder Sangha" <js@coalitiondev.com> writes:

I've setup the postgres database as follows:

LANG=C
initdb -E UNICODE
createdb -E UNICODE

I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all
kinds of things, and nothing seems to fix it.

regards, tom lane

Import Notes

Resolved by subject fallback