change db encoding

Started by Alexander Cohenabout 22 years ago2 messagesgeneral
Jump to latest
#1Alexander Cohen
alex@toomuchspace.com

How would i go about changing a databases encoding? Is this at all
possible?

There does not seem to be much i can with ALTER DATABASE except change
its name!

--
Alexander Cohen
http://www.toomuchspace.com
(819) 348-9237
(819) 432-3443

#2Ron St-Pierre
rstpierre@syscor.com
In reply to: Alexander Cohen (#1)
Re: change db encoding

Alexander Cohen wrote:

How would i go about changing a databases encoding? Is this at all
possible?

There does not seem to be much i can with ALTER DATABASE except change
its name!

You could try to:
-pg_dump the database to file,
-drop the database,
-recreate the database with unicode encoding (createdb mynewdb -e unicode),
-check the dumped file for any 'strange' characters and change as needed,
-restore to the new database

PS I will have to do this soon to several databases. If you do convert
yours, please post a follow-up message on this thread outlining your
experience.

Here are the steps from a previous thread for a conversion to a
different language (I couldn't find the link so I'm pasting it into this
message):
-------------------------------------------------------------------------------------------------------------------------------
Re: [GENERAL] Converting SQL-ASCII encoding database to UNICODE
Jean-Michel POURE wrote:

Le Dimanche 9 Novembre 2003 19:39, Rajesh Kumar Mallah a �crit :

If so what is the process

The advantage of using a Unicode database is that UTF-8 supports/includes all
known encodings at once. Therefore, in the process of development, it can
help you save time.

When using a Unicode database, if the client does not support Unicode, it is
always possible to recode data streams on the fly with "set client_encoding =
foo_encoding". Therefore, there is no real drawback in using Unicode. It may
only be a little slower, but there is no real proof.

The process of conversion is as follows:

- dump the whole database using pg_dump:
pg_dump database_name > pg_data_dump.sql

- Do no drop the old database. Make a backup of the dump:
cp pg_data_dump.sql pg_data_dump.sql.bak

- recode the dump using the GNU recode package:
recode ..u8 pg_data_dump.sql

recode will try to guess the encoding the original ASCII database.

- Open the file in an UTF-8 editor like Kate and verify that all characters
are preserved and display well (in Hindi?). If it does not work, it may be a
problem of original encoding. Try to guess it and retry using:

cp -f source_encoding..u8.bak source_encoding..u8
recode source_encoding..u8 pg_data_dump.sql

- create an empty Unicode database:
psql template1
create database new_database with encoding=Unicode;
reload the dump: psql new_database < pg_data_dump.sql

GNU/recode is available in most GNU/Linux distributions.

By the way, as you live in India, do not hesitate to visit
http://pgadmin.postgresql.org/pgadmin3/translation.php if you can help us.

Most translations in languages for India are "stalled", do not hesitate to
take over the translation in Hindi for example.

Cheers,
Jean-Michel

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

\-------------------------------------------------------------------------------------------------------------------------------

hth
Ron