Converting an ASCII database to an UTF-8 database
Hi All,
I have a database in PostgreSQL which is ASCII.
Due to some internationalization issues, I need to convert the database
to the UTF-8 format.
So my question is:
How do I convert a database in the ASCII format into one of the UTF-8
format?
Thanks in advance
- Kishore
On f�s, 2006-02-17 at 05:21 -0800, kishore.sainath@gmail.com wrote:
Hi All,
I have a database in PostgreSQL which is ASCII.
Due to some internationalization issues, I need to convert the database
to the UTF-8 format.So my question is:
How do I convert a database in the ASCII format into one of the UTF-8
format?
using pg_dump ?
gnari
kishore.sainath@gmail.com wrote:
How do I convert a database in the ASCII format into one of the UTF-8
format?
ASCII is a subset of UTF-8, so you don't need to do anything. Just
change the encoding entry in the pg_database table. Of course, using
pg_dump would be the official way to convert a database between any two
encodings.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
On f�s, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:
kishore.sainath@gmail.com wrote:
How do I convert a database in the ASCII format into one of the UTF-8
format?ASCII is a subset of UTF-8, so you don't need to do anything. Just
change the encoding entry in the pg_database table. Of course, using
pg_dump would be the official way to convert a database between any two
encodings.
This will only work correctly if the database
definitely does not contain non-ASCII characters.
Assuming by ASCII format we mean that the database was
created SQL_ASCII, then it is possible that it contains
invalid UTF-8 characters, as SQL_ASCII is a 8 bit
encoding.
consider:
template1=# create database test with encoding='SQL_ASCII';
CREATE DATABASE
template1=# \connect test
You are now connected to database "test".
test=# create table a (x text);
CREATE TABLE
test=# insert into a values ('�');
INSERT 33304378 1
test=# select * from a;
x
---
�
(1 row)
test=# update pg_database set encoding =
pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
UPDATE 1
test=# select * from a;
x
---
�
(1 row)
test=# \connect template1
You are now connected to database "template1".
template1=# \connect test
You are now connected to database "test".
test=# select * from a;
x
---
(1 row)
test=#
gnari
I have this exact problem. I have dumped and reloaded other
databases and set the client encoding to convert them to UTF-8 but I
have one database with values that still cause it to fail, even if I
specify that the client encoding is SQL_ASCII. How do I fix that?
On Feb 17, 2006, at 4:08 PM, Ragnar wrote:
Show quoted text
On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:
kishore.sainath@gmail.com wrote:
How do I convert a database in the ASCII format into one of the
UTF-8
format?ASCII is a subset of UTF-8, so you don't need to do anything. Just
change the encoding entry in the pg_database table. Of course, using
pg_dump would be the official way to convert a database between
any two
encodings.This will only work correctly if the database
definitely does not contain non-ASCII characters.Assuming by ASCII format we mean that the database was
created SQL_ASCII, then it is possible that it contains
invalid UTF-8 characters, as SQL_ASCII is a 8 bit
encoding.consider:
template1=# create database test with encoding='SQL_ASCII';
CREATE DATABASE
template1=# \connect test
You are now connected to database "test".
test=# create table a (x text);
CREATE TABLE
test=# insert into a values ('á');
INSERT 33304378 1
test=# select * from a;
x
---
á
(1 row)test=# update pg_database set encoding =
pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
UPDATE 1
test=# select * from a;
x
---
á
(1 row)test=# \connect template1
You are now connected to database "template1".
template1=# \connect test
You are now connected to database "test".
test=# select * from a;
x
---(1 row)
test=#
gnari
---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
I believe PostgreSQL treat UTF-8 and LATIN9 Differently.
When I tried to dump a db to a UTF-8 encoding and restore it
with UTF-8 encoding (also) it encountered problems with fields
that have unicoded values thus it stop from restoring the whole dump.
So I tried using LATIN9 encoding for both dump and restore.
I believe that LATIN9=UTF-8 encoding base on the DOCS.
I wonder why it is like that.
Jan Cruz wrote:
I believe PostgreSQL treat UTF-8 and LATIN9 Differently.
Certainly, considering that they are different encodings.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Rick Gigger wrote:
I have this exact problem. I have dumped and reloaded other
databases and set the client encoding to convert them to UTF-8 but I
have one database with values that still cause it to fail, even if I
specify that the client encoding is SQL_ASCII. How do I fix that?
Well then you need to figure out what encoding those non-ASCII
characters actually are and set the client encoding to that. Trying
LATIN1 might be a good start, but if you users/applications have
inserted random bytes then you will have to sort it out by hand.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/