Change server encoding after the fact

Started by Cody Caughlanover 14 years ago18 messagesgeneral
Jump to latest
#1Cody Caughlan
toolbag@gmail.com

I would like to change my server_encoding which is currently SQL_ASCII to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');

2) Dump my current database

pg_dump -Fc foo > foo.db

3) Drop my current database

drop database foo;

4) recreate it with the proper encoding

create database foo with template = template1 encoding = 'UTF-8';

5) restore from backup

pg_restore -d foo foo.db

Are these the correct steps to perform or is there an easier / in-place way?

Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding and its going into a UTF-8 database)?

Thank you

/Cody

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Cody Caughlan (#1)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:

I would like to change my server_encoding which is currently SQL_ASCII to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0', 'template1');

Just create database using template0 as template and you can skip this step ^^

Are these the correct steps to perform or is there an easier / in-place way?

Also, when I dump my old DB and restore it, will it be converted appropriately (e.g. it came from am SQL_ASCII encoding and its going into a UTF-8 database)?

You might need to set client encoding when restoring. Or use iconv to
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.

#3Cody Caughlan
toolbag@gmail.com
In reply to: Scott Marlowe (#2)
Re: Change server encoding after the fact

Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:

I would like to change my server_encoding which is currently SQL_ASCII to

UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0',

'template1');

Just create database using template0 as template and you can skip this step
^^

Wouldn't this only work if my template0 was UTF8 itself?

=> select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
----------------------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII

So it appears both template0 & template1 are SQL_ASCII, so how would
creating from a new DB from template0 be any different than template1?

Are these the correct steps to perform or is there an easier / in-place

way?

Also, when I dump my old DB and restore it, will it be converted

appropriately (e.g. it came from am SQL_ASCII encoding and its going into a
UTF-8 database)?

You might need to set client encoding when restoring. Or use iconv to
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.

Yes, I see this might be the case. From my playing around with iconv I
cannot even properly do the conversion:

$ pg_dump -Fp foo > foo.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$ iconv -f utf-8 foo.sql > utf8.sql
iconv: illegal input sequence at position 2512661

Uh oh... I cannot event convert it?

Whats my next step at this point if I cannot even convert my data? I'd be OK
with some lossiness.

Thanks again
/Cody

#4Scott Marlowe
scott.marlowe@gmail.com
In reply to: Cody Caughlan (#3)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com> wrote:

I would like to change my server_encoding which is currently SQL_ASCII
to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0',
'template1');

Just create database using template0 as template and you can skip this
step ^^

Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
       datname        | pg_encoding_to_char
----------------------+---------------------
 template1            | SQL_ASCII
 template0            | SQL_ASCII
 postgres             | SQL_ASCII

So it appears both template0 & template1 are SQL_ASCII, so how would
creating from a new DB from template0 be any different than template1?

Well, let's try, shall we? From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
-----------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
smarlowe | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)

Are these the correct steps to perform or is there an easier / in-place
way?

Also, when I dump my old DB and restore it, will it be converted
appropriately (e.g. it came from am SQL_ASCII encoding and its going into a
UTF-8 database)?

You might need to set client encoding when restoring.  Or use iconv to
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.

Yes, I see this might be the case. From my playing around with iconv I
cannot even properly do the conversion:
$ pg_dump -Fp foo > foo.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$ iconv -f utf-8 foo.sql > utf8.sql
iconv: illegal input sequence at position 2512661

I think you got it backwards, the -f should be somthing other than
utf-8 right? That's what the -t should be right? Try iconv without a
-f switch and a -t of utf-8 and see what happens...

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Marlowe (#4)
Re: Change server encoding after the fact

Scott Marlowe <scott.marlowe@gmail.com> writes:

On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:

So it appears both template0 & template1 are SQL_ASCII, so how would
creating from a new DB from template0 be any different than template1?

P.s. I'm not sure why it works, I just know that it does. :)

CREATE DATABASE assumes that template0 cannot contain any non-ASCII
data, so it's okay to clone it and then pretend that the result has some
other encoding. The same assumption cannot be made for template1, since
that's user-modifiable.

regards, tom lane

#6Cody Caughlan
toolbag@gmail.com
In reply to: Scott Marlowe (#4)
Re: Change server encoding after the fact

Please see below.

On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe <scott.marlowe@gmail.com

wrote:

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>

wrote:

I would like to change my server_encoding which is currently SQL_ASCII
to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0',
'template1');

Just create database using template0 as template and you can skip this
step ^^

Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
----------------------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII

So it appears both template0 & template1 are SQL_ASCII, so how would
creating from a new DB from template0 be any different than template1?

Well, let's try, shall we? From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
-----------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
smarlowe | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)

Ok, I see what you mean. This would create a new DB with the proper
encoding. Which is "fine", and probably what I will do. I guess I see an
ideal scenario being one where we permanently convert the template encoding
to UTF8 so going forward I dont have to worry about forgetting to adding the
encoding= 'UTF8' for every new DB I create.

Are these the correct steps to perform or is there an easier /

in-place

way?

Also, when I dump my old DB and restore it, will it be converted
appropriately (e.g. it came from am SQL_ASCII encoding and its going

into a

UTF-8 database)?

You might need to set client encoding when restoring. Or use iconv to
convert from one encoding to another, which is what I usually do.
Note that it's VERY likely you'll have data in a SQL_ASCII db that
won't go into a UTF8 database without some lossiness.

Yes, I see this might be the case. From my playing around with iconv I
cannot even properly do the conversion:
$ pg_dump -Fp foo > foo.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$ iconv -f utf-8 foo.sql > utf8.sql
iconv: illegal input sequence at position 2512661

I think you got it backwards, the -f should be somthing other than
utf-8 right? That's what the -t should be right? Try iconv without a
-f switch and a -t of utf-8 and see what happens...

You're right, I had -f when I needed -t. I tried it again with the same
error:

$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Cody Caughlan (#6)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Please see below.

On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
<scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
wrote:

I would like to change my server_encoding which is currently
SQL_ASCII
to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0',
'template1');

Just create database using template0 as template and you can skip this
step ^^

Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
       datname        | pg_encoding_to_char
----------------------+---------------------
 template1            | SQL_ASCII
 template0            | SQL_ASCII
 postgres             | SQL_ASCII

So it appears both template0 & template1 are SQL_ASCII, so how would
creating from a new DB from template0 be any different than template1?

Well, let's try, shall we?  From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from pg_database;
 datname  | pg_encoding_to_char
-----------+---------------------
 template1 | SQL_ASCII
 template0 | SQL_ASCII
 postgres  | SQL_ASCII
 smarlowe  | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)

Ok, I see what you mean. This would create a new DB with the proper
encoding. Which is "fine", and probably what I will do. I guess I see an
ideal scenario being one where we permanently convert the template encoding
to UTF8 so going forward I dont have to worry about forgetting to adding the
encoding= 'UTF8' for every new DB I create.

Ah ok. The way I fix that is this:

update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';

But your way would likely work too.

I think you got it backwards, the -f should be somthing other than
utf-8 right?  That's what the -t should be right?  Try iconv without a
-f switch and a -t of utf-8 and see what happens...

You're right, I had -f when I needed -t. I tried it again with the same
error:
$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661

Any idea waht the actual encoding of your source database is?
SQL_ASCII is basically not really ascii, more like anything goes.

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Cody Caughlan (#6)
Re: Change server encoding after the fact

Cody Caughlan <toolbag@gmail.com> writes:

Ok, I see what you mean. This would create a new DB with the proper
encoding. Which is "fine", and probably what I will do. I guess I see an
ideal scenario being one where we permanently convert the template encoding
to UTF8 so going forward I dont have to worry about forgetting to adding the
encoding= 'UTF8' for every new DB I create.

Well, if you're feeling brave you can mark template0 as having utf8
encoding via a manual update to pg_database. In theory that should
be safe enough. If you know template1 doesn't, and never has, contained
any non-ASCII data, you could do the same to it ... but it would be a
lot safer to drop it and recreate from template0. See
http://wiki.postgresql.org/wiki/Adventures_in_PostgreSQL,_Episode_1
for some context.

regards, tom lane

#9Cody Caughlan
toolbag@gmail.com
In reply to: Scott Marlowe (#7)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Please see below.

On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe <scott.marlowe@gmail.com

wrote:

On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>

wrote:

Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
<scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
wrote:

I would like to change my server_encoding which is currently
SQL_ASCII
to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0',
'template1');

Just create database using template0 as template and you can skip

this

step ^^

Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
----------------------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII

So it appears both template0 & template1 are SQL_ASCII, so how would
creating from a new DB from template0 be any different than template1?

Well, let's try, shall we? From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from

pg_database;

datname | pg_encoding_to_char
-----------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
smarlowe | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)

Ok, I see what you mean. This would create a new DB with the proper
encoding. Which is "fine", and probably what I will do. I guess I see an
ideal scenario being one where we permanently convert the template

encoding

to UTF8 so going forward I dont have to worry about forgetting to adding

the

encoding= 'UTF8' for every new DB I create.

Ah ok. The way I fix that is this:

update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';

But your way would likely work too.

I think you got it backwards, the -f should be somthing other than
utf-8 right? That's what the -t should be right? Try iconv without a
-f switch and a -t of utf-8 and see what happens...

You're right, I had -f when I needed -t. I tried it again with the same
error:
$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661

Any idea waht the actual encoding of your source database is?
SQL_ASCII is basically not really ascii, more like anything goes.

How would I find this? pg_database says my DB is SQL_ASCII.

"show all" says

client_encoding = SQL_ASCII
server_encoding = SQL_ASCII

#10Scott Marlowe
scott.marlowe@gmail.com
In reply to: Cody Caughlan (#9)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote:

On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Please see below.

On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe
<scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>
wrote:

Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
<scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <toolbag@gmail.com>
wrote:

I would like to change my server_encoding which is currently
SQL_ASCII
to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN ('template0',
'template1');

Just create database using template0 as template and you can skip
this
step ^^

Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
       datname        | pg_encoding_to_char
----------------------+---------------------
 template1            | SQL_ASCII
 template0            | SQL_ASCII
 postgres             | SQL_ASCII

So it appears both template0 & template1 are SQL_ASCII, so how would
creating from a new DB from template0 be any different than
template1?

Well, let's try, shall we?  From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from
pg_database;
 datname  | pg_encoding_to_char
-----------+---------------------
 template1 | SQL_ASCII
 template0 | SQL_ASCII
 postgres  | SQL_ASCII
 smarlowe  | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)

Ok, I see what you mean. This would create a new DB with the proper
encoding. Which is "fine", and probably what I will do. I guess I see an
ideal scenario being one where we permanently convert the template
encoding
to UTF8 so going forward I dont have to worry about forgetting to adding
the
encoding= 'UTF8' for every new DB I create.

Ah ok.  The way I fix that is this:

update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';

But your way would likely work too.

I think you got it backwards, the -f should be somthing other than
utf-8 right?  That's what the -t should be right?  Try iconv without a
-f switch and a -t of utf-8 and see what happens...

You're right, I had -f when I needed -t. I tried it again with the same
error:
$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661

Any idea waht the actual encoding of your source database is?
SQL_ASCII is basically not really ascii, more like anything goes.

How would I find this? pg_database says my DB is SQL_ASCII.
"show all" says
client_encoding = SQL_ASCII
server_encoding = SQL_ASCII

It would have been set by the application accessing postgresql and
inserting the data. I.e. was it a windows app using a typical windows
encoding? etc.

#11Cody Caughlan
toolbag@gmail.com
In reply to: Scott Marlowe (#10)
Re: Change server encoding after the fact

Its a Rails app and I do have:

encoding: utf8

Set in my DB configuration.

On Fri, Sep 30, 2011 at 12:38 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

Show quoted text

On Fri, Sep 30, 2011 at 1:35 PM, Cody Caughlan <toolbag@gmail.com> wrote:

On Fri, Sep 30, 2011 at 12:31 PM, Scott Marlowe <scott.marlowe@gmail.com

wrote:

On Fri, Sep 30, 2011 at 12:46 PM, Cody Caughlan <toolbag@gmail.com>

wrote:

Please see below.

On Fri, Sep 30, 2011 at 11:38 AM, Scott Marlowe
<scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 12:26 PM, Cody Caughlan <toolbag@gmail.com>
wrote:

Thanks Scott. See below:

On Fri, Sep 30, 2011 at 11:05 AM, Scott Marlowe
<scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 11:25 AM, Cody Caughlan <

toolbag@gmail.com>

wrote:

I would like to change my server_encoding which is currently
SQL_ASCII
to UTF8.

I have existing data that I would like to keep.

From my understanding of the steps I need to:

1) alter the template1 database encoding via

UPDATE pg_database SET encoding = 6 where datname IN

('template0',

'template1');

Just create database using template0 as template and you can skip
this
step ^^

Wouldn't this only work if my template0 was UTF8 itself?
=> select datname, pg_encoding_to_char(encoding) from pg_database;
datname | pg_encoding_to_char
----------------------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII

So it appears both template0 & template1 are SQL_ASCII, so how

would

creating from a new DB from template0 be any different than
template1?

Well, let's try, shall we? From a freshly created cluster on my
laptop, running 8.4:

smarlowe=# select datname, pg_encoding_to_char(encoding) from
pg_database;
datname | pg_encoding_to_char
-----------+---------------------
template1 | SQL_ASCII
template0 | SQL_ASCII
postgres | SQL_ASCII
smarlowe | SQL_ASCII
(4 rows)

smarlowe=# create database j template template0 encoding 'UTF8';
CREATE DATABASE

Seems to work.

P.s. I'm not sure why it works, I just know that it does. :)

Ok, I see what you mean. This would create a new DB with the proper
encoding. Which is "fine", and probably what I will do. I guess I see

an

ideal scenario being one where we permanently convert the template
encoding
to UTF8 so going forward I dont have to worry about forgetting to

adding

the
encoding= 'UTF8' for every new DB I create.

Ah ok. The way I fix that is this:

update pg_database set datistemplate = false where datname='template1';
drop database template1;
create database template1 template template0 encoding 'UTF8';

But your way would likely work too.

I think you got it backwards, the -f should be somthing other than
utf-8 right? That's what the -t should be right? Try iconv without

a

-f switch and a -t of utf-8 and see what happens...

You're right, I had -f when I needed -t. I tried it again with the

same

error:
$ iconv -t utf-8 foo.sql > utf.sql
iconv: illegal input sequence at position 2512661

Any idea waht the actual encoding of your source database is?
SQL_ASCII is basically not really ascii, more like anything goes.

How would I find this? pg_database says my DB is SQL_ASCII.
"show all" says
client_encoding = SQL_ASCII
server_encoding = SQL_ASCII

It would have been set by the application accessing postgresql and
inserting the data. I.e. was it a windows app using a typical windows
encoding? etc.

#12Scott Marlowe
scott.marlowe@gmail.com
In reply to: Cody Caughlan (#11)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Its a Rails app and I do have:
  encoding: utf8

Hmmm, if you try this does it work (mostly)?

iconv -f utf-8 -t utf-8 -c < infile > outfile

#13Scott Marlowe
scott.marlowe@gmail.com
In reply to: Scott Marlowe (#12)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 1:41 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Its a Rails app and I do have:
  encoding: utf8

Hmmm, if you try this does it work (mostly)?

iconv -f utf-8 -t utf-8 -c < infile > outfile

If that doesn't work try:

iconv -f utf-8 -t utf-8//IGNORE -c < infile > outfile

#14Cody Caughlan
toolbag@gmail.com
In reply to: Scott Marlowe (#12)
Re: Change server encoding after the fact

That worked, but "file" shows no difference:

$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii

$file -i utf.sql
utf.sql: text/plain; charset=us-ascii

So iconv didnt actually convert the file OR does is the "file" command just
ignorant?

On Fri, Sep 30, 2011 at 12:41 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

Show quoted text

On Fri, Sep 30, 2011 at 1:39 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Its a Rails app and I do have:
encoding: utf8

Hmmm, if you try this does it work (mostly)?

iconv -f utf-8 -t utf-8 -c < infile > outfile

#15Scott Marlowe
scott.marlowe@gmail.com
In reply to: Cody Caughlan (#14)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:

That worked, but "file" shows no difference:
$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$file -i utf.sql
utf.sql: text/plain; charset=us-ascii
So iconv didnt actually convert the file OR does is the "file" command just
ignorant?

Not sure. try loading the dump into the UTF-8 DB in postgres and see
what happens I guess?

#16Cody Caughlan
toolbag@gmail.com
In reply to: Scott Marlowe (#15)
Re: Change server encoding after the fact

Please see below.

On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:

That worked, but "file" shows no difference:
$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$file -i utf.sql
utf.sql: text/plain; charset=us-ascii
So iconv didnt actually convert the file OR does is the "file" command

just

ignorant?

Not sure. try loading the dump into the UTF-8 DB in postgres and see
what happens I guess?

Uh oh.

On the remote machine:

$ pg_dump -Fc -E UTF8 foo > foo.sql

Then I've created a new local DB with UTF8 encoding and I try to restore
this dump into it:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA
wine_books vinosmith
pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:
invalid byte sequence for encoding "UTF8": 0xc309
CONTEXT: COPY wine_books, line 1147
WARNING: errors ignored on restore: 1

And sure enough the table "wine_books" is empty. Not good.

#17Scott Marlowe
scott.marlowe@gmail.com
In reply to: Cody Caughlan (#16)
Re: Change server encoding after the fact

On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Please see below.

On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com> wrote:

That worked, but "file" shows no difference:
$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$file -i utf.sql
utf.sql: text/plain; charset=us-ascii
So iconv didnt actually convert the file OR does is the "file" command
just
ignorant?

Not sure.  try loading the dump into the UTF-8 DB in postgres and see
what happens I guess?

Uh oh.
On the remote machine:
$ pg_dump -Fc -E UTF8 foo > foo.sql
Then I've created a new local DB with UTF8 encoding and I try to restore
this dump into it:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA
wine_books vinosmith
pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:
 invalid byte sequence for encoding "UTF8": 0xc309
CONTEXT:  COPY wine_books, line 1147
WARNING: errors ignored on restore: 1
And sure enough the table "wine_books" is empty. Not good.

You may have to hunt down that one bad line (1147) and chop it out /
edit it so it works.

#18Cody Caughlan
toolbag@gmail.com
In reply to: Scott Marlowe (#17)
Re: Change server encoding after the fact

Thanks y'all for your help on this.

I took this opportunity to upgrade to 9.1.1 which is UTF8 by default and I
ended up manually cleaning up the borked data by hand (there wasn't that
much).

So all is well now.

Thanks again.

/Cody

On Fri, Sep 30, 2011 at 3:37 PM, Scott Marlowe <scott.marlowe@gmail.com>wrote:

Show quoted text

On Fri, Sep 30, 2011 at 2:57 PM, Cody Caughlan <toolbag@gmail.com> wrote:

Please see below.

On Fri, Sep 30, 2011 at 1:12 PM, Scott Marlowe <scott.marlowe@gmail.com>
wrote:

On Fri, Sep 30, 2011 at 1:45 PM, Cody Caughlan <toolbag@gmail.com>

wrote:

That worked, but "file" shows no difference:
$ iconv -f utf-8 -t utf-8 -c foo.sql > utf.sql
$ file -i foo.sql
foo.sql: text/plain; charset=us-ascii
$file -i utf.sql
utf.sql: text/plain; charset=us-ascii
So iconv didnt actually convert the file OR does is the "file" command
just
ignorant?

Not sure. try loading the dump into the UTF-8 DB in postgres and see
what happens I guess?

Uh oh.
On the remote machine:
$ pg_dump -Fc -E UTF8 foo > foo.sql
Then I've created a new local DB with UTF8 encoding and I try to restore
this dump into it:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2342; 0 17086 TABLE DATA
wine_books vinosmith
pg_restore: [archiver (db)] COPY failed for table "wine_books": ERROR:
invalid byte sequence for encoding "UTF8": 0xc309
CONTEXT: COPY wine_books, line 1147
WARNING: errors ignored on restore: 1
And sure enough the table "wine_books" is empty. Not good.

You may have to hunt down that one bad line (1147) and chop it out /
edit it so it works.