Unicode support problem
Hi all,
I'm having a problem with unicode support in postgres under linux.
The issue is that I am copying lots of data from an MS SQL Server
database via java/jdbc running on Windows XP over to a postgres database
running on linux. I've setup the postgres database as follows:
LANG=C
initdb -E UNICODE
createdb -E UNICODE
And then when I'm transferring the data, when my program tries to send a
string containing a character 0xF6 (Latin small letter o with
diaeresis), then I get a JDBC exception & an error log on the server as
follows:
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.
I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all kinds
of things, and nothing seems to fix it.
If I setup the database as follows:
LANG=C
initdb -E iso8859_1
createdb -E iso8859_1
Then it appears to work OK - but I then get an error with character 0xE2
(Latin small letter a with circumflex):
ERROR: could not convert UTF-8 character 0x00e2 to ISO8859-1
Does anyone know how to do correctly do this?
This is my environment:
LINUX: DEBIAN 3.0, KERNEL 2.4 running on a 2CPU PC.
Postgres: 8.0.1 built from source, no changes to anything, running on
the linux box.
JDBC driver: postgresql-8.0-310.jdbc3.jar
Java JVM (Sun) 1.4.2_02 on Windows XP SP2.
If I run postgres on the Windows XP machine (configured for UNICODE as
above), then I don't have any errors at all. This only happens on the
linux box.
Any help in fixing this would be greatly appreciated.
Thanks,
--Jatinder Sangha
Coalition Development
"Jatinder Sangha" <js@coalitiondev.com> writes:
I've setup the postgres database as follows:
LANG=C
initdb -E UNICODE
createdb -E UNICODE
I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all kinds
of things, and nothing seems to fix it.
You can't just pick random combinations of locale and database encoding.
Any given locale setting implies a character set encoding, and you have
to use that same encoding as the database encoding; at least if you want
encoding-dependent operations such as upper()/lower() to work. The
locale you want for Unicode (UTF8) may be named something like
"en_US.utf8". Try "locale -a" to get a list of supported locales.
regards, tom lane
If I setup the database as follows:
LANG=C
initdb -E iso8859_1
createdb -E iso8859_1Then it appears to work OK - but I then get an error with character 0xE2
(Latin small letter a with circumflex):
ERROR: could not convert UTF-8 character 0x00e2 to ISO8859-1
The error message says all. You are trying to convert an UTF-8
character starting with 0x00e2 to ISO-8859-1, which does not exist in
the world. All ISO-8859-1 chars in UTF-8 are below 0x00e0 range.
Probably you mixed up with ISO-8859-2 or any other characters other
than ISO-8859-1?
--
Tatsuo Ishii
Hi Tom,
Thanks for the reply -- yes, creating the en_US.utf8 locale and using
that, fixed all of my problems.
Thanks,
--Jatinder
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 24 February 2005 17:11
To: Jatinder Sangha
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Unicode support problem
"Jatinder Sangha" <js@coalitiondev.com> writes:
I've setup the postgres database as follows:
LANG=C
initdb -E UNICODE
createdb -E UNICODE
I have tried setting locale/lc_ctype to C, POSIX, iso_8859_1, all
kinds of things, and nothing seems to fix it.
You can't just pick random combinations of locale and database encoding.
Any given locale setting implies a character set encoding, and you have
to use that same encoding as the database encoding; at least if you want
encoding-dependent operations such as upper()/lower() to work. The
locale you want for Unicode (UTF8) may be named something like
"en_US.utf8". Try "locale -a" to get a list of supported locales.
regards, tom lane
Import Notes
Resolved by subject fallback