How do I change the server encoding?

Started by Joseph Shraibmanalmost 23 years ago8 messages
#1Joseph Shraibman
jks@selectacast.net

I have a server that has LATIN1 encoding. I want to convert it to run UTF encoding. How
do I do that? Simply changing the encoding in a dump file does not work.

#2Antti Haapala
antti.haapala@iki.fi
In reply to: Joseph Shraibman (#1)
Re: How do I change the server encoding?

On Mon, 24 Feb 2003, Joseph Shraibman wrote:

I have a server that has LATIN1 encoding. I want to convert it to run UTF encoding. How
do I do that? Simply changing the encoding in a dump file does not work.

So have you done both of these:
- dropped and recreated your db with encoding 'utf-8'
- converted your dumps to utf-8 or
added set client_encoding to 'latin1' in the dumps

--
Antti Haapala

#3Philippe Kiener
philippe.kiener@eivd.ch
In reply to: Antti Haapala (#2)
Re: How do I change the server encoding?

Hello
I have the same question that Joseph Shraibman.
I have dump the db, created a new db with utf-8 encoding

My database should be transform from SQL_ASCII to utf-8

I have added that line to my dumps:

SET CLIENT_ENCODING TO 'SQL_ASCII';

Now when I load the dump into my db, I get that error on tables with text:

psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character
sequence found (0xe96500)
psql:tcom-database.sql:7111: lost synchronization with server, resetting
connection
psql:tcom-database.sql:7409: ERROR: copy: line 1, Invalid UNICODE character
sequence found (0xe97265)
psql:tcom-database.sql:7409: lost synchronization with server, resetting
connection
psql:tcom-database.sql:7456: ERROR: copy: line 3, Invalid UNICODE character
sequence found (0xe90007)
psql:tcom-database.sql:7456: lost synchronization with server, resetting
connection
psql:tcom-database.sql:7468: ERROR: copy: line 6, Invalid UNICODE character
sequence found (0xe97300)

Any ideas?

Thanks for your help.

Philippe Kiener

Le 25.2.2003 8:55, "Antti Haapala" <antti.haapala@iki.fi> wrote:

Show quoted text

On Mon, 24 Feb 2003, Joseph Shraibman wrote:

I have a server that has LATIN1 encoding. I want to convert it to run UTF
encoding. How
do I do that? Simply changing the encoding in a dump file does not work.

So have you done both of these:
- dropped and recreated your db with encoding 'utf-8'
- converted your dumps to utf-8 or
added set client_encoding to 'latin1' in the dumps

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Philippe Kiener (#3)
Re: How do I change the server encoding?

Philippe Kiener writes:

My database should be transform from SQL_ASCII to utf-8

I have added that line to my dumps:

SET CLIENT_ENCODING TO 'SQL_ASCII';

Now when I load the dump into my db, I get that error on tables with text:

psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character
sequence found (0xe96500)

The client encoding SQL_ASCII means that the data will be passed through
unchanged. Try setting it to LATIN1.

--
Peter Eisentraut peter_e@gmx.net

#5Joseph Shraibman
jks@selectacast.net
In reply to: Peter Eisentraut (#4)
Re: How do I change the server encoding?

Peter Eisentraut wrote:

Philippe Kiener writes:

My database should be transform from SQL_ASCII to utf-8

I have added that line to my dumps:

SET CLIENT_ENCODING TO 'SQL_ASCII';

Now when I load the dump into my db, I get that error on tables with text:

psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character
sequence found (0xe96500)

The client encoding SQL_ASCII means that the data will be passed through
unchanged. Try setting it to LATIN1.

I tried with latin1 and it didn't work.

#6Joseph Shraibman
jks@selectacast.net
In reply to: Joseph Shraibman (#5)
Re: How do I change the server encoding?

Joseph Shraibman wrote:
After further experimenting I think the problem is in psql. When I try
update mytable set firstname = 'Oné' where ukey = 12911;

It works with a latin1 database, but when I try it on a unicode database:

utfowl=# update mytable set firstname = 'Oné' where ukey = 12911;
utfowl'#

It thinks there is an open quote or something. This is even if I set the client encoding
to be latin1. Of course dumps are read with the copy command but maybe it is the same
problem.

#7Antti Haapala
antti.haapala@iki.fi
In reply to: Joseph Shraibman (#5)
Re: How do I change the server encoding?

On Tue, 25 Feb 2003, Joseph Shraibman wrote:

Peter Eisentraut wrote:

Philippe Kiener writes:

My database should be transform from SQL_ASCII to utf-8

I have added that line to my dumps:

SET CLIENT_ENCODING TO 'SQL_ASCII';

Now when I load the dump into my db, I get that error on tables with text:

psql:tcom-database.sql:7111: ERROR: copy: line 1, Invalid UNICODE character
sequence found (0xe96500)

The client encoding SQL_ASCII means that the data will be passed through
unchanged. Try setting it to LATIN1.

I tried with latin1 and it didn't work.

Hmm... still caused errors? I think that because newer dumps have those
\connects, you need to add explicit char set settings after all of those.

The better way would be converting the whole dump with iconv, though.
Iconv comes by default with many unixen. For example command

iconv -f iso-8859-1 -t utf-8 < text_dump > text_dump_converted

will convert your dump from latin1 to utf-8.

--
Antti Haapala

#8Joseph Shraibman
jks@selectacast.net
In reply to: Joseph Shraibman (#6)
Re: How do I change the server encoding? SOLVED

Joseph Shraibman wrote:

Joseph Shraibman wrote:
After further experimenting I think the problem is in psql. When I try
update mytable set firstname = 'Oné' where ukey = 12911;

It works with a latin1 database, but when I try it on a unicode database:

utfowl=# update mytable set firstname = 'Oné' where ukey = 12911;
utfowl'#

It thinks there is an open quote or something. This is even if I set
the client encoding to be latin1. Of course dumps are read with the
copy command but maybe it is the same problem.

I solved the problem. "set client_encoding = 'latin1';" does not work, but "\encoding
latin1" does. I suggest that pg_dump put a "\encoding <encoding>" after every \connect
in the dump. I would do this myself but I can't figure out where that is done in the dump
program.

I did modify pg_dump.c so the encoding used during the dump can be specified on the
command line, but since that isn't what solved the problem I'm not sure there is a point
to having it. Is anyone interested?