Converting an ASCII database to an UTF-8 database

Started by Nonameabout 20 years ago8 messagesgeneral
Jump to latest
#1Noname
kishore.sainath@gmail.com

Hi All,

I have a database in PostgreSQL which is ASCII.
Due to some internationalization issues, I need to convert the database
to the UTF-8 format.

So my question is:
How do I convert a database in the ASCII format into one of the UTF-8
format?

Thanks in advance
- Kishore

#2Ragnar
gnari@hive.is
In reply to: Noname (#1)
Re: Converting an ASCII database to an UTF-8 database

On f�s, 2006-02-17 at 05:21 -0800, kishore.sainath@gmail.com wrote:

Hi All,

I have a database in PostgreSQL which is ASCII.
Due to some internationalization issues, I need to convert the database
to the UTF-8 format.

So my question is:
How do I convert a database in the ASCII format into one of the UTF-8
format?

using pg_dump ?

gnari

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Noname (#1)
Re: Converting an ASCII database to an UTF-8 database

kishore.sainath@gmail.com wrote:

How do I convert a database in the ASCII format into one of the UTF-8
format?

ASCII is a subset of UTF-8, so you don't need to do anything. Just
change the encoding entry in the pg_database table. Of course, using
pg_dump would be the official way to convert a database between any two
encodings.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#4Ragnar
gnari@hive.is
In reply to: Peter Eisentraut (#3)
Re: Converting an ASCII database to an UTF-8 database

On f�s, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:

kishore.sainath@gmail.com wrote:

How do I convert a database in the ASCII format into one of the UTF-8
format?

ASCII is a subset of UTF-8, so you don't need to do anything. Just
change the encoding entry in the pg_database table. Of course, using
pg_dump would be the official way to convert a database between any two
encodings.

This will only work correctly if the database
definitely does not contain non-ASCII characters.

Assuming by ASCII format we mean that the database was
created SQL_ASCII, then it is possible that it contains
invalid UTF-8 characters, as SQL_ASCII is a 8 bit
encoding.

consider:

template1=# create database test with encoding='SQL_ASCII';
CREATE DATABASE
template1=# \connect test
You are now connected to database "test".
test=# create table a (x text);
CREATE TABLE
test=# insert into a values ('�');
INSERT 33304378 1
test=# select * from a;
x
---
�
(1 row)

test=# update pg_database set encoding =
pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
UPDATE 1
test=# select * from a;
x
---
�
(1 row)

test=# \connect template1
You are now connected to database "template1".
template1=# \connect test
You are now connected to database "test".
test=# select * from a;
x
---

(1 row)

test=#

gnari

#5Rick Gigger
rick@alpinenetworking.com
In reply to: Ragnar (#4)
Re: Converting an ASCII database to an UTF-8 database

I have this exact problem. I have dumped and reloaded other
databases and set the client encoding to convert them to UTF-8 but I
have one database with values that still cause it to fail, even if I
specify that the client encoding is SQL_ASCII. How do I fix that?

On Feb 17, 2006, at 4:08 PM, Ragnar wrote:

Show quoted text

On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:

kishore.sainath@gmail.com wrote:

How do I convert a database in the ASCII format into one of the
UTF-8
format?

ASCII is a subset of UTF-8, so you don't need to do anything. Just
change the encoding entry in the pg_database table. Of course, using
pg_dump would be the official way to convert a database between
any two
encodings.

This will only work correctly if the database
definitely does not contain non-ASCII characters.

Assuming by ASCII format we mean that the database was
created SQL_ASCII, then it is possible that it contains
invalid UTF-8 characters, as SQL_ASCII is a 8 bit
encoding.

consider:

template1=# create database test with encoding='SQL_ASCII';
CREATE DATABASE
template1=# \connect test
You are now connected to database "test".
test=# create table a (x text);
CREATE TABLE
test=# insert into a values ('á');
INSERT 33304378 1
test=# select * from a;
x
---
á
(1 row)

test=# update pg_database set encoding =
pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
UPDATE 1
test=# select * from a;
x
---
á
(1 row)

test=# \connect template1
You are now connected to database "template1".
template1=# \connect test
You are now connected to database "test".
test=# select * from a;
x
---

(1 row)

test=#

gnari

---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#6Jan Cruz
malebug@gmail.com
In reply to: Rick Gigger (#5)
Re: Converting an ASCII database to an UTF-8 database

I believe PostgreSQL treat UTF-8 and LATIN9 Differently.
When I tried to dump a db to a UTF-8 encoding and restore it
with UTF-8 encoding (also) it encountered problems with fields
that have unicoded values thus it stop from restoring the whole dump.

So I tried using LATIN9 encoding for both dump and restore.

I believe that LATIN9=UTF-8 encoding base on the DOCS.

I wonder why it is like that.

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Jan Cruz (#6)
Re: Converting an ASCII database to an UTF-8 database

Jan Cruz wrote:

I believe PostgreSQL treat UTF-8 and LATIN9 Differently.

Certainly, considering that they are different encodings.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Rick Gigger (#5)
Re: Converting an ASCII database to an UTF-8 database

Rick Gigger wrote:

I have this exact problem. I have dumped and reloaded other
databases and set the client encoding to convert them to UTF-8 but I
have one database with values that still cause it to fail, even if I
specify that the client encoding is SQL_ASCII. How do I fix that?

Well then you need to figure out what encoding those non-ASCII
characters actually are and set the client encoding to that. Trying
LATIN1 might be a good start, but if you users/applications have
inserted random bytes then you will have to sort it out by hand.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/