Easy way to convert a database from WIN1252 to UTF8?
I'd like to convert a small database to UTF8 before it becomes too
large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin
has any native way of doing this, what's the easiest way to go about
doing this? Thanks!
Mike
On Thu, Jul 1, 2010 at 10:22 AM, Justin Graf <justin@magwerks.com> wrote:
On 7/1/2010 11:08 AM, Mike Christensen wrote:
I'd like to convert a small database to UTF8 before it becomes too
large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin
has any native way of doing this, what's the easiest way to go about
doing this? Thanks!Mike
Dump/Backup the database , then create a new database using utf-8
then restore the database.
This is what I'm trying to do, but it's a total nightmare..
First, I did a:
pg_dump -U root MyDB > c:\DB.dbs.out
which appears to have worked.. Then, I edited the file in Notepad and
saved it as UTF8 which also appears to have worked. Next, I created
the new DB using UTF8, and ran:
psql -U root MyDB2 < c:\DB.dbs.out
I get pages and pages of errors about foreign key restraint violations
and other stuff. Looking at the DB after, almost all tables are
empty. When I look at the DB.dbs.out file more carefully, the problem
is fairly obvious. It attempts to create all the tables in
alphabetical order. First, it inserts data into the "A" table which
has a FK restraint on the B table which isn't populated yet.
Obviously, this is going to cause problems. Perhaps when restoring a
DB you're supposed to drop all restraints first? I suppose I could
hack this into working eventually, but I was hoping there was an
easier way.. Thanks..
Mike
Import Notes
Reply to msg id not found: 4C2CCECF.8030108@magwerks.com
On Thu, Jul 1, 2010 at 9:31 AM, Mike Christensen <mike@kitchenpc.com> wrote:
On Thu, Jul 1, 2010 at 10:22 AM, Justin Graf <justin@magwerks.com> wrote:
On 7/1/2010 11:08 AM, Mike Christensen wrote:
I'd like to convert a small database to UTF8 before it becomes too
large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin
has any native way of doing this, what's the easiest way to go about
doing this? Thanks!Mike
Dump/Backup the database , then create a new database using utf-8
then restore the database.This is what I'm trying to do, but it's a total nightmare..
First, I did a:
pg_dump -U root MyDB > c:\DB.dbs.out
which appears to have worked.. Then, I edited the file in Notepad and
saved it as UTF8 which also appears to have worked. Next, I created
the new DB using UTF8, and ran:psql -U root MyDB2 < c:\DB.dbs.out
I get pages and pages of errors about foreign key restraint violations
and other stuff. Looking at the DB after, almost all tables are
empty. When I look at the DB.dbs.out file more carefully, the problem
is fairly obvious. It attempts to create all the tables in
alphabetical order. First, it inserts data into the "A" table which
has a FK restraint on the B table which isn't populated yet.
Obviously, this is going to cause problems. Perhaps when restoring a
DB you're supposed to drop all restraints first? I suppose I could
hack this into working eventually, but I was hoping there was an
easier way.. Thanks..Mike
Ok first off, my bad it seems the db_dump scripts are smart enough to
add constraints last..
Here's the actual errors I get while restoring, which seem to result
in several tables being empty. Does this mean that the data in my
source database has somehow lost integrity, even though I have
constraints in place?
ERROR: character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
CONTEXT: COPY notifications, line 170
ERROR: character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
CONTEXT: COPY recipes, line 2
ERROR: character 0x9d of encoding "WIN1252" has no equivalent in "UTF8"
CONTEXT: COPY wallposts, line 13
ERROR: insert or update on table "reciperatings" violates foreign key constrain
t "fk2f0492774b4d795a"
DETAIL: Key (recipeid)=(1202a454-ca32-49e0-9f38-0d973d80b2a9) is not present in
table "recipes".
ERROR: insert or update on table "recipeingredients" violates foreign key const
raint "fk673433f4b4d795a"
DETAIL: Key (recipeid)=(139ff23b-bead-48ac-b606-fc12467ff940) is not present in
table "recipes".
ERROR: insert or update on table "calendars" violates foreign key constraint "f
kb1f25cf24b4d795a"
DETAIL: Key (recipeid)=(2f944c26-efd1-406c-bd27-0be760e09f19) is not present in
table "recipes".
ERROR: insert or update on table "recipetags" violates foreign key constraint "
fkeabd5f754b4d795a"
DETAIL: Key (recipeid)=(f9ebec00-6ebd-459d-a093-060bc3a755af) is not present in
table "recipes".
ERROR: insert or update on table "recipecomments" violates foreign key constrai
nt "fkec92d3c74b4d795a"
DETAIL: Key (recipeid)=(1af68f09-0cba-4fae-8a08-9e56a3e72676) is not present in
table "recipes".
ERROR: insert or update on table "favorites" violates foreign key constraint "f
kf92d78404b4d795a"
DETAIL: Key (recipeid)=(86c9e722-3508-4910-8a3e-10a705271cbb) is not present in
table "recipes".
On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
Then, I edited the file in Notepad and saved it as UTF8 which also
appears to have worked.
I don't think you want to be doing this. The file should contain a "set
client_encoding" of the correct value which will let PG to do the "right
thing" when you import it back into a UTF8 database.
--
Sam http://samason.me.uk/
On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote:
On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
Then, I edited the file in Notepad and saved it as UTF8 which also
appears to have worked.I don't think you want to be doing this. The file should contain a "set
client_encoding" of the correct value which will let PG to do the "right
thing" when you import it back into a UTF8 database.--
Sam http://samason.me.uk/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yup, that fixed it. I had to change the DB.dbs.out file to use the
new encoding. Duh! Everything's working now.
Mike
On Thu, Jul 01, 2010 at 09:47:03AM -0700, Mike Christensen wrote:
On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote:
On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
Then, I edited the file in Notepad and saved it as UTF8 which also
appears to have worked.I don't think you want to be doing this. The file should contain a "set
client_encoding" of the correct value which will let PG to do the "right
thing" when you import it back into a UTF8 database.Yup, that fixed it. I had to change the DB.dbs.out file to use the
new encoding. Duh! Everything's working now.
AFAIK, you should be able to leave the file exactly as it came out of
pg_dump and PG will do all the conversions automatically.
If you've not checked already, it may be worth checking through to see
what row was on line 170 and see if it's got the right thing in there
now.
--
Sam http://samason.me.uk/
On Thu, Jul 1, 2010 at 9:54 AM, Sam Mason <sam@samason.me.uk> wrote:
On Thu, Jul 01, 2010 at 09:47:03AM -0700, Mike Christensen wrote:
On Thu, Jul 1, 2010 at 9:44 AM, Sam Mason <sam@samason.me.uk> wrote:
On Thu, Jul 01, 2010 at 09:31:12AM -0700, Mike Christensen wrote:
Then, I edited the file in Notepad and saved it as UTF8 which also
appears to have worked.I don't think you want to be doing this. The file should contain a "set
client_encoding" of the correct value which will let PG to do the "right
thing" when you import it back into a UTF8 database.Yup, that fixed it. I had to change the DB.dbs.out file to use the
new encoding. Duh! Everything's working now.AFAIK, you should be able to leave the file exactly as it came out of
pg_dump and PG will do all the conversions automatically.If you've not checked already, it may be worth checking through to see
what row was on line 170 and see if it's got the right thing in there
now.--
Sam http://samason.me.uk/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yup, the problem is line 170 doesn't actually match up to the
DB.dbs.out file line 170 (which is a blank line). I believe it means
line 170 from the stdin pipe it was processing for the copy command.
Suffice to say, there was some weird character in my database that PG
can't automatically translate from WIN1252 to UTF8, and apparently it
will drop that /entire/ COPY command (the entire table doesn't get
populated!)..
As to what character was the culprit, I'm not entirely sure how to
figure this out. I guess I could look for that hex value? However,
if I set the encoding in the script itself, everything works
perfectly.
I'm not sure this is a PG bug or anything, but having more detail on
this error might be helpful.
Mike
On Thu, Jul 01, 2010 at 10:01:02AM -0700, Mike Christensen wrote:
Yup, the problem is line 170 doesn't actually match up to the
DB.dbs.out file line 170 (which is a blank line). I believe it means
line 170 from the stdin pipe it was processing for the copy command.
Doh, that's annoying. It would be nice to know that it's done the right
thing rather than "some" thing.
Suffice to say, there was some weird character in my database that PG
can't automatically translate from WIN1252 to UTF8, and apparently it
will drop that /entire/ COPY command (the entire table doesn't get
populated!)..
Yup, this is deliberate. You can also run psql with "-1" to put the
whole lot (i.e. every table/view/... creation and data insert) in a
transaction which will cause the whole restore to be rolled back if
something doesn't look right as well.
As to what character was the culprit, I'm not entirely sure how to
figure this out. I guess I could look for that hex value? However,
if I set the encoding in the script itself, everything works
perfectly.
PG is doing the right thing, 9D is undefined in Win1252. I guess you've
either got other problems or this was just an artifact of converting
from Win1252 to UTF8 external to PG and then not telling it that you'd
done that.
--
Sam http://samason.me.uk/
On Thu, Jul 1, 2010 at 10:07 AM, Sam Mason <sam@samason.me.uk> wrote:
On Thu, Jul 01, 2010 at 10:01:02AM -0700, Mike Christensen wrote:
Yup, the problem is line 170 doesn't actually match up to the
DB.dbs.out file line 170 (which is a blank line). I believe it means
line 170 from the stdin pipe it was processing for the copy command.Doh, that's annoying. It would be nice to know that it's done the right
thing rather than "some" thing.Suffice to say, there was some weird character in my database that PG
can't automatically translate from WIN1252 to UTF8, and apparently it
will drop that /entire/ COPY command (the entire table doesn't get
populated!)..Yup, this is deliberate. You can also run psql with "-1" to put the
whole lot (i.e. every table/view/... creation and data insert) in a
transaction which will cause the whole restore to be rolled back if
something doesn't look right as well.As to what character was the culprit, I'm not entirely sure how to
figure this out. I guess I could look for that hex value? However,
if I set the encoding in the script itself, everything works
perfectly.PG is doing the right thing, 9D is undefined in Win1252. I guess you've
either got other problems or this was just an artifact of converting
from Win1252 to UTF8 external to PG and then not telling it that you'd
done that.--
Sam http://samason.me.uk/--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yeah, looking at the lines in question I don't really see anything
wrong with them. Everything is going into the database as UTF8 so
maybe some weird characters got stuck in there somehow with the old
default encoding. This is the main reason why I'm converting to UTF8
now, so data will be consistent across all layers.. Good to get these
bugs out of the way while the data set is relatively small.
If anyone wants me to do any more debugging, I'd be more than happy to
but I'm satisfied with the results. Thanks!
Mike
On 7/1/2010 11:08 AM, Mike Christensen wrote:
I'd like to convert a small database to UTF8 before it becomes too
large. I'm running on 8.3.x on Windows. It doesn't seem that pgAdmin
has any native way of doing this, what's the easiest way to go about
doing this? Thanks!Mike
Dump/Backup the database , then create a new database using utf-8
then restore the database.
All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored.
CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately.
Thank you.