Questions about encoding between two databases

Started by Archibald Zimonyiover 16 years ago8 messagesgeneral
Jump to latest
#1Archibald Zimonyi
arsi@aranzo.netg.se

Hello,

I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.

From all I can read I should have no problem with actual format of the

pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.

However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?

I get the below message when I try the psql -f <file> <database> command.

psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.

Any help would be appreciated.

Archie

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Archibald Zimonyi (#1)
Re: Questions about encoding between two databases

On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:

Hello,

I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.

However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?

I get the below message when I try the psql -f <file> <database> command.

psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.

Any help would be appreciated.

Archie

To get the question out of the way, is there a reason you are not upgrading to
latest version, 8.4?

Suggestion below is untested:
Use pg_dump from 8.3.x to dump from 7.4 database.

From here:
http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html

"
-E encoding
--encoding=encoding

Create the dump in the specified character set encoding. By default, the
dump is created in the database encoding. (Another way to get the same result
is to set the PGCLIENTENCODING environment variable to the desired dump
encoding.) "

Use the encoding switch to create the dump in UTF8.

--
Adrian Klaver
aklaver@comcast.net

#3Archibald Zimonyi
arsi@aranzo.netg.se
In reply to: Adrian Klaver (#2)
Re: Questions about encoding between two databases

On Fri, 21 Aug 2009, Adrian Klaver wrote:

On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:

Hello,

I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.

However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?

I get the below message when I try the psql -f <file> <database> command.

psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.

Any help would be appreciated.

Archie

To get the question out of the way, is there a reason you are not upgrading to
latest version, 8.4?

Yes, I use Debian stable which which as far as I know only has 8.3.x as
its latest version. But it shouldn't really matter in this case as I would
most likely have the same problem with 8.4.x.

Suggestion below is untested:
Use pg_dump from 8.3.x to dump from 7.4 database.

The two version are located on two different machines, so probably not
possible.

From here:
http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html

"
-E encoding
--encoding=encoding

Create the dump in the specified character set encoding. By default, the
dump is created in the database encoding. (Another way to get the same result
is to set the PGCLIENTENCODING environment variable to the desired dump
encoding.) "

Use the encoding switch to create the dump in UTF8.

I will look at this PGCLIENTENCODING variable to see if I can set that in
7.4.x but does anyone know the answer to it already? Would it work?

Will that also work with pg_dumpall?

Thanks for the response so far.

Archie

#4Archibald Zimonyi
arsi@aranzo.netg.se
In reply to: Archibald Zimonyi (#3)
Re: Questions about encoding between two databases

Hello,

I tired changing the client_encoding setting but there was no differance
in the result.

I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).

So back to square one again.

Archie

Show quoted text

On Fri, 21 Aug 2009, Adrian Klaver wrote:

On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:

Hello,

I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.

However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?

I get the below message when I try the psql -f <file> <database> command.

psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.

Any help would be appreciated.

Archie

To get the question out of the way, is there a reason you are not upgrading
to
latest version, 8.4?

Yes, I use Debian stable which which as far as I know only has 8.3.x as its
latest version. But it shouldn't really matter in this case as I would most
likely have the same problem with 8.4.x.

Suggestion below is untested:
Use pg_dump from 8.3.x to dump from 7.4 database.

The two version are located on two different machines, so probably not
possible.

From here:
http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html

"
-E encoding
--encoding=encoding

Create the dump in the specified character set encoding. By default, the
dump is created in the database encoding. (Another way to get the same
result
is to set the PGCLIENTENCODING environment variable to the desired dump
encoding.) "

Use the encoding switch to create the dump in UTF8.

I will look at this PGCLIENTENCODING variable to see if I can set that in
7.4.x but does anyone know the answer to it already? Would it work?

Will that also work with pg_dumpall?

Thanks for the response so far.

Archie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Archibald Zimonyi (#4)
Re: Questions about encoding between two databases

Archibald Zimonyi <arsi@aranzo.netg.se> writes:

I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).

Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)

regards, tom lane

#6Archibald Zimonyi
arsi@aranzo.netg.se
In reply to: Tom Lane (#5)
Re: Questions about encoding between two databases

Hello,

Archibald Zimonyi <arsi@aranzo.netg.se> writes:

I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).

Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)

regards, tom lane

Well, I did send this query with an incorrect email address so it got
stuck and was never posted properly, so I have not seen any such reply.
Can you please explain again?

The ENCODING options in the CREATE DATABASE commands, yet these commands
exist in the dump file. I don't understand.

But yes, after my change, the databases schemas were all created with UTF8
so that part worked, but of course the actual text which was LATIN1 before
failed for those character sets where UTF8 differs from LATIN1, so it
still fails.

I will try using iconv as suggested in another reply, but shouldn't that
then mean I need to change the client_encoding (so that it matches)?

Archie

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Archibald Zimonyi (#6)
Re: Questions about encoding between two databases

Archibald Zimonyi wrote:

Hello,

Archibald Zimonyi <arsi@aranzo.netg.se> writes:

I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).

Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)

Well, I did send this query with an incorrect email address so it
got stuck and was never posted properly, so I have not seen any such
reply. Can you please explain again?

Search the archives: http://archives.postgresql.org/

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Archibald Zimonyi
arsi@aranzo.netg.se
In reply to: Alvaro Herrera (#7)
Re: Questions about encoding between two databases

Hello,

iconv seemed to work fine. I converted the dump file from LATIN1 to UFT8
and kept the changes in the client_encoding (in the dump file) and loaded
them all into the database.

No complains. I still need to verify the result but at least I got no
restore errors based on character encoding.

Thanks for the tips.

Archie

Show quoted text

Archibald Zimonyi wrote:

Hello,

Archibald Zimonyi <arsi@aranzo.netg.se> writes:

I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).

Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)

Well, I did send this query with an incorrect email address so it
got stuck and was never posted properly, so I have not seen any such
reply. Can you please explain again?

Search the archives: http://archives.postgresql.org/

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general