upgrading to 8.3, utf-8 and latin2 locale problem

Started by Mageabout 18 years ago3 messagesgeneral
Jump to latest
#1Mage
mage@mage.hu

Hello,

I am sure this won't be the first e-mail about this issue, however we
are upgrading production-like environment. Please help.

For reproducing I've used two debian servers, same locales (en_US.UTF-8,
en_US ISO-8859-1, hu_HU.UTF-8, hu_HU ISO-8859-2), Debian testing.

------------------------------------------------
Postgresql 8.2 (8.2.6-2):

/usr/lib/postgresql/8.2/bin/initdb -D /home/readonly/pg_data/
--locale='en_US.UTF-8' --lc-collate='hu_HU.UTF-8'
--lc-ctype='hu_HU.UTF-8' --lc-time='hu_HU.UTF-8'
The files belonging to this database system will be owned by user "mage".
This user must also own the server process.

The database cluster will be initialized with locales
COLLATE: hu_HU.UTF-8
CTYPE: hu_HU.UTF-8
MESSAGES: en_US.UTF-8
MONETARY: en_US.UTF-8
NUMERIC: en_US.UTF-8
TIME: hu_HU.UTF-8
The default database encoding has accordingly been set to UTF8.

/usr/lib/postgresql/8.2/bin/pg_ctl -D /home/readonly/pg_data -l logfile
-o '-p 5555' start
/usr/lib/postgresql/8.2/bin/psql -p 5555 template1

# create database test encoding = 'latin2';
CREATE DATABASE

------------------------------------------------
Postgresql 8.3 (8.3.0-1):

/usr/lib/postgresql/8.3/bin/initdb -D /home/readonly/pg_data/
--locale='en_US.UTF-8' --lc-collate='hu_HU.UTF-8'
--lc-ctype='hu_HU.UTF-8' --lc-time='hu_HU.UTF-8'
The files belonging to this database system will be owned by user "mage".
This user must also own the server process.

The database cluster will be initialized with locales
COLLATE: hu_HU.UTF-8
CTYPE: hu_HU.UTF-8
MESSAGES: en_US.UTF-8
MONETARY: en_US.UTF-8
NUMERIC: en_US.UTF-8
TIME: hu_HU.UTF-8
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "hungarian".

/usr/lib/postgresql/8.3/bin/pg_ctl -D /home/readonly/pg_data -l logfile
-o '-p 5555' start
/usr/lib/postgresql/8.3/bin/psql -p 5555 template1

template1=# create database test encoding = 'latin2';
ERROR: encoding LATIN2 does not match server's locale hu_HU.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.

In Google we've found similar err messages for pg_upgradecluster.

----------------

Both server:
show all;
client_encoding | UTF8
lc_collate | hu_HU.UTF-8
lc_ctype | hu_HU.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | hu_HU.UTF-8
server_encoding | UTF8

We would like to upgrade from 8.1 to 8.3. We have UTF-8 and LATIN2
databases. Any idea?

Mage

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mage (#1)
Re: upgrading to 8.3, utf-8 and latin2 locale problem

Mage <mage@mage.hu> writes:

We would like to upgrade from 8.1 to 8.3. We have UTF-8 and LATIN2
databases. Any idea?

If you were running with a non-C database locale, that was always
broken in 8.1, and you are very fortunate not to have stumbled across
any of the failure cases.

You can either standardize on UTF8 for all your databases (note that
this does not stop your *clients* from using LATIN2 if they want),
or use C locale which will work equally poorly with all encodings ;-)

regards, tom lane

#3Mage
mage@mage.hu
In reply to: Tom Lane (#2)
Re: upgrading to 8.3, utf-8 and latin2 locale problem

Tom Lane wrote:

Mage <mage@mage.hu> writes:

We would like to upgrade from 8.1 to 8.3. We have UTF-8 and LATIN2
databases. Any idea?

If you were running with a non-C database locale, that was always
broken in 8.1, and you are very fortunate not to have stumbled across
any of the failure cases.

You can either standardize on UTF8 for all your databases (note that
this does not stop your *clients* from using LATIN2 if they want),
or use C locale which will work equally poorly with all encodings ;-)

If it were up to me, I'd never use LATIN2. I switched to unicode years ago.
Some of our databases don't belong to me and I can't modify their clients.

What is the proper use of "create database xxxx encoding = 'yyy'" in
postgresql 8.3? If I understand You, I should avoid it totally, and
convert every affected database dumps to UTF-8, load them and use "alter
database xxx set client_encoding = 'latin2'". Is it right?

Mage