Move data between two databases SQL-ASCII to UTF8

Started by Nonameabout 19 years ago4 messagesgeneral

MargaretGillon@chromalloy.com

about 19 years ago

I need to convert my database to UTF8. Is there a way to do a SELECT ...
INSERT from the old database table to the new one? Would the INSERT
correct data errors between the two data types? I only have 10 tables and
the biggest has < 8000 rows.

Running Version 8.1.4 on Redhat 9
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
*** ***
Margaret Gillon, IS Dept., Chromalloy Los Angeles, ext. 297

This e-mail message and any attachment(s) are for the sole use of the
intended recipient(s) and may contain proprietary and/or confidential
information which may be privileged or otherwise protected from
disclosure. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient(s), please contact the
sender by reply email and destroy the original message and any copies of
the message as well as any attachment(s) to the original message.

Clodoaldo

clodoaldo.pinto.neto@gmail.com

about 19 years ago

In reply to: Noname (#1)

Re: Move data between two databases SQL-ASCII to UTF8

2007/2/8, MargaretGillon@chromalloy.com <MargaretGillon@chromalloy.com>:

I need to convert my database to UTF8. Is there a way to do a SELECT ...
INSERT from the old database table to the new one? Would the INSERT correct
data errors between the two data types? I only have 10 tables and the
biggest has < 8000 rows.

Use pg_dump to dump the db and use iconv on the generated file:

iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump

<GUESS>
If the characters are strictly ASCII (<=127) then the conversion will
not be necessary. But if there are characters bigger than 127 then the
conversion will have to be made from iso-8859-1 to utf-8:

iconv -f ISO_8859-1 -t UTF-8 mydb.dump -o mydb_utf8.dump
</GUESS>

Regards,
--
Clodoaldo Pinto Neto

Chad Wagner

chad.wagner@gmail.com

about 19 years ago

In reply to: Clodoaldo (#2)

Re: Move data between two databases SQL-ASCII to UTF8

On 2/8/07, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote:

Use pg_dump to dump the db and use iconv on the generated file:

iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump

Wouldn't it be adequate to set the client encoding to SQL_ASCII in the dump
file (if that was infact the encoding on the original database)?

SET client_encoding TO SQL_ASCII;

And then let the database do the conversion? I would think since the db is
UTF8 and the client is claiming SQL_ASCII then it would convert the data to
UTF8.

I have done this in the past with SQL dumps that had characters that UTF8
didn't like, and I just added the "SET client_encoding TO LATIN1;" since I
knew the source encoding was LATIN1.

--
Chad
http://www.postgresqlforums.com/

Michael Fuhr

mike@fuhr.org

about 19 years ago

In reply to: Chad Wagner (#3)

Re: Move data between two databases SQL-ASCII to UTF8

On Thu, Feb 08, 2007 at 08:22:40PM -0500, Chad Wagner wrote:

On 2/8/07, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote:

Use pg_dump to dump the db and use iconv on the generated file:

iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump

Converting the data from ASCII to UTF-8 doesn't make much sense:
if the data is ASCII then it doesn't need conversion; if the data
needs conversion then it isn't ASCII.

Wouldn't it be adequate to set the client encoding to SQL_ASCII in the dump
file (if that was infact the encoding on the original database)?

http://www.postgresql.org/docs/8.2/interactive/multibyte.html#AEN24118

"If the client character set is defined as SQL_ASCII, encoding
conversion is disabled, regardless of the server's character set."

As Clodoaldo mentioned, if the data is strictly ASCII then no
conversion is necessary because the UTF-8 representation will be
the same. If you set client_encoding to SQL_ASCII and the data
contains non-ASCII characters that aren't valid UTF-8 then you'll
get the error 'invalid byte sequence for encoding "UTF8"'. In that
case set client_encoding to whatever encoding the data is really
in; likely guesses for Western European languages are LATIN1, LATIN9,
or perhaps WIN1252.

--
Michael Fuhr