Questions about encoding between two databases
Hello,
I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.
However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?
I get the below message when I try the psql -f <file> <database> command.
psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.
Any help would be appreciated.
Archie
Import Notes
Reply to msg id not found: 606172fa27f805dd76cadb5580961ff6502c0aa1@postgresql.orgReference msg id not found: 606172fa27f805dd76cadb5580961ff6502c0aa1@postgresql.org
On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:
Hello,
I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?I get the below message when I try the psql -f <file> <database> command.
psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.Any help would be appreciated.
Archie
To get the question out of the way, is there a reason you are not upgrading to
latest version, 8.4?
Suggestion below is untested:
Use pg_dump from 8.3.x to dump from 7.4 database.
From here:
http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html
"
-E encoding
--encoding=encoding
Create the dump in the specified character set encoding. By default, the
dump is created in the database encoding. (Another way to get the same result
is to set the PGCLIENTENCODING environment variable to the desired dump
encoding.) "
Use the encoding switch to create the dump in UTF8.
--
Adrian Klaver
aklaver@comcast.net
On Fri, 21 Aug 2009, Adrian Klaver wrote:
On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:
Hello,
I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?I get the below message when I try the psql -f <file> <database> command.
psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.Any help would be appreciated.
Archie
To get the question out of the way, is there a reason you are not upgrading to
latest version, 8.4?
Yes, I use Debian stable which which as far as I know only has 8.3.x as
its latest version. But it shouldn't really matter in this case as I would
most likely have the same problem with 8.4.x.
Suggestion below is untested:
Use pg_dump from 8.3.x to dump from 7.4 database.
The two version are located on two different machines, so probably not
possible.
From here:
http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html"
-E encoding
--encoding=encodingCreate the dump in the specified character set encoding. By default, the
dump is created in the database encoding. (Another way to get the same result
is to set the PGCLIENTENCODING environment variable to the desired dump
encoding.) "Use the encoding switch to create the dump in UTF8.
I will look at this PGCLIENTENCODING variable to see if I can set that in
7.4.x but does anyone know the answer to it already? Would it work?
Will that also work with pg_dumpall?
Thanks for the response so far.
Archie
Hello,
I tired changing the client_encoding setting but there was no differance
in the result.
I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).
So back to square one again.
Archie
Show quoted text
On Fri, 21 Aug 2009, Adrian Klaver wrote:
On Thursday 20 August 2009 11:45:30 pm Archibald Zimonyi wrote:
Hello,
I am sitting on version 7.4.x and am going to upgrade to version 8.3.x.
From all I can read I should have no problem with actual format of the
pgdump file (for actual dumping and restoring purposes) but I am
having problems with encoding (which I was fairly sure I would). I have
searched the web for solutions and one solution given (in one thread where
Tom Lane answered) was to set the correct encoding in the version 8.3.x
database.However, the default encoding in the version 8.3.x instance is
currently UTF8 and I am happy with that. The encoding for most of the
databases in the version 7.4.x was LATIN1. Is there any way I can ignore
the LATIN1 encoding and force the database to accept the UTF8 encoding of
the new version 8.3.x instance?I get the below message when I try the psql -f <file> <database> command.
psql:aranzo20090812:30: ERROR: encoding LATIN1 does not match server's
locale en_US.UTF-8
DETAIL: The server's LC_CTYPE setting requires encoding UTF8.Any help would be appreciated.
Archie
To get the question out of the way, is there a reason you are not upgrading
to
latest version, 8.4?Yes, I use Debian stable which which as far as I know only has 8.3.x as its
latest version. But it shouldn't really matter in this case as I would most
likely have the same problem with 8.4.x.Suggestion below is untested:
Use pg_dump from 8.3.x to dump from 7.4 database.The two version are located on two different machines, so probably not
possible.From here:
http://www.postgresql.org/docs/8.3/interactive/app-pgdump.html"
-E encoding
--encoding=encodingCreate the dump in the specified character set encoding. By default, the
dump is created in the database encoding. (Another way to get the same
result
is to set the PGCLIENTENCODING environment variable to the desired dump
encoding.) "Use the encoding switch to create the dump in UTF8.
I will look at this PGCLIENTENCODING variable to see if I can set that in
7.4.x but does anyone know the answer to it already? Would it work?Will that also work with pg_dumpall?
Thanks for the response so far.
Archie
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Archibald Zimonyi <arsi@aranzo.netg.se> writes:
I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).
Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)
regards, tom lane
Hello,
Archibald Zimonyi <arsi@aranzo.netg.se> writes:
I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)regards, tom lane
Well, I did send this query with an incorrect email address so it got
stuck and was never posted properly, so I have not seen any such reply.
Can you please explain again?
The ENCODING options in the CREATE DATABASE commands, yet these commands
exist in the dump file. I don't understand.
But yes, after my change, the databases schemas were all created with UTF8
so that part worked, but of course the actual text which was LATIN1 before
failed for those character sets where UTF8 differs from LATIN1, so it
still fails.
I will try using iconv as suggested in another reply, but shouldn't that
then mean I need to change the client_encoding (so that it matches)?
Archie
Archibald Zimonyi wrote:
Hello,
Archibald Zimonyi <arsi@aranzo.netg.se> writes:
I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)
Well, I did send this query with an incorrect email address so it
got stuck and was never posted properly, so I have not seen any such
reply. Can you please explain again?
Search the archives: http://archives.postgresql.org/
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hello,
iconv seemed to work fine. I converted the dump file from LATIN1 to UFT8
and kept the changes in the client_encoding (in the dump file) and loaded
them all into the database.
No complains. I still need to verify the result but at least I got no
restore errors based on character encoding.
Thanks for the tips.
Archie
Show quoted text
Archibald Zimonyi wrote:
Hello,
Archibald Zimonyi <arsi@aranzo.netg.se> writes:
I went into the generated dump file and (more wish then anything else)
tried to simply change the encoding from LATIN1 to UTF8 and then load the
file, it did not complain about incorrect encoding setting for the load,
however it complained that the characters did not match true UTF8
characters (which was almost what I guessed would happen).Indeed. Do *not* change the client_encoding setting in the dump file.
You can edit the ENCODING options in the CREATE DATABASE commands
though. (Didn't we explain this to you already?)Well, I did send this query with an incorrect email address so it
got stuck and was never posted properly, so I have not seen any such
reply. Can you please explain again?Search the archives: http://archives.postgresql.org/
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general