Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

Started by sunpengalmost 12 years ago7 messagesgeneral
Jump to latest
#1sunpeng
bluevaley@gmail.com

When I do migration from Mysql to PostgreSQL:
firstly dump data from mysql in cmd(encoding is GBK) is WIN8:
mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql
--default-character-set=utf8 --skip-add-locks --compact --no-create-info
--skip-quote-names -uroot -p test >dbdata.sql

then load data to postgresql in cmd(encoding is GBK) is WIN8:
psql -h localhost -d test -U postgres < dbdata.sql

I got the error:
ERROR: invalid byte sequence for encoding "UTF8": 0xff

I checked in dbdata.sql using UltraEdit, data "0xff" really exists as
followings:
蛾4?4抿\0xfF???±??x¤?
and I use UltraEdit menu-> view -> encoding page to get: 936 GBK.

DDL in Mysql 5.5 is:
CREATE TABLE `personpicture` (
`ID` char(32) NOT NULL,
`Picture` mediumblob,
...
KEY `personId` (`PersonID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DDL in PostgreSQL 9.1 is:
create database test encoding 'utf8';
CREATE TABLE personpicture (
ID char(32) NOT NULL,
Picture BYTEA,
....
PRIMARY KEY (ID)
);

The error is related to Picture mediumblob and BYTEA, which is stored pic
binary data.
My Operationg system is WIN8 64bit.

I've set postgresql.conf before starting everything, but error still occurs:
backslash_quote = on
escape_string_warning = off
standard_conforming_strings = off

How to resolve it?
Thanks!

peng

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: sunpeng (#1)
Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

sunpeng <bluevaley@gmail.com> wrote:

load data to postgresql in cmd(encoding is GBK) is WIN8:

psql -h localhost  -d test -U postgres <  dbdata.sql

I got the error:
ERROR:  invalid byte sequence for encoding "UTF8": 0xff

If the encoding is GBK then you will get errors (or incorrect
characters) if it is read as UTF8.  Try setting the environment
variable PGCLIENTENCODING.

http://www.postgresql.org/docs/9.1/static/app-psql.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3sunpeng
bluevaley@gmail.com
In reply to: Kevin Grittner (#2)
Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

I‘v changed cmd (in win8) to encoding utf8 through chcp 65001, but error
still occurs.
And i use the following cmd to dump mysql data:
mysql> select Picture from personpicture where id =
'F2931306D1EE44ca82394CD3BC2404D4' into outfile "d:\\1.txt" ;
I got the ansi file, and use Ultraedit to see first 16 bytes:
FF D8 FF E0 5C 30 10 4A 46 49 46 5C 30 01 01 5C
It's different from mysql workbench to see:
FF D8 FF E0 00 10 4a 46 49 46 00 01 01 00 00 01

peng

On Tue, Jul 1, 2014 at 9:18 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Show quoted text

sunpeng <bluevaley@gmail.com> wrote:

load data to postgresql in cmd(encoding is GBK) is WIN8:

psql -h localhost -d test -U postgres < dbdata.sql

I got the error:
ERROR: invalid byte sequence for encoding "UTF8": 0xff

If the encoding is GBK then you will get errors (or incorrect
characters) if it is read as UTF8. Try setting the environment
variable PGCLIENTENCODING.

http://www.postgresql.org/docs/9.1/static/app-psql.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: sunpeng (#3)
Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

sunpeng wrote:

load data to postgresql in cmd(encoding is GBK) is WIN8:

psql -h localhost -d test -U postgres < dbdata.sql

I got the error:
ERROR: invalid byte sequence for encoding "UTF8": 0xff

If the encoding is GBK then you will get errors (or incorrect
characters) if it is read as UTF8. Try setting the environment
variable PGCLIENTENCODING.

http://www.postgresql.org/docs/9.1/static/app-psql.html

I‘v changed cmd (in win8) to encoding utf8 through chcp 65001, but error still occurs.
And i use the following cmd to dump mysql data:
mysql> select Picture from personpicture where id = 'F2931306D1EE44ca82394CD3BC2404D4' into outfile
"d:\\1.txt" ;
I got the ansi file, and use Ultraedit to see first 16 bytes:
FF D8 FF E0 5C 30 10 4A 46 49 46 5C 30 01 01 5C

It's different from mysql workbench to see:
FF D8 FF E0 00 10 4a 46 49 46 00 01 01 00 00 01

Changing the terminal code page won't do anything, it's probably the data
that are in a different encoding.

I don't know enough about MySQL to know which encoding it uses when dumping data,
but the man page of "mysqldump" tells me:

--set-charset
Add SET NAMES default_character_set to the output. This option is enabled by default.

So is there a SET NAMES command in the dump? If yes, what is the argument?

You will have to tell PostgreSQL the encoding of the data.
As Kevin pointed out, you can do that by setting the environment variable
PGCLIENT ENCODING to the correct value. Then PostgreSQL will convert the
data automatically.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5sunpeng
bluevaley@gmail.com
In reply to: Laurenz Albe (#4)
Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

Thank you, friend, I use --hex-blob :
mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql
--default-character-set=utf8 --skip-add-locks --compact --no-create-info
--skip-quote-names --hex-blob -uroot -p test videorecresult >dbdata.sql
to dump mysql data.
And replace blob data "0x...." into "E'\\xx....'" to load data into
postgresql.

On Fri, Jul 4, 2014 at 3:27 PM, Albe Laurenz <laurenz.albe@wien.gv.at>
wrote:

Show quoted text

sunpeng wrote:

load data to postgresql in cmd(encoding is GBK) is WIN8:

psql -h localhost -d test -U postgres < dbdata.sql

I got the error:
ERROR: invalid byte sequence for encoding "UTF8": 0xff

If the encoding is GBK then you will get errors (or incorrect
characters) if it is read as UTF8. Try setting the environment
variable PGCLIENTENCODING.

http://www.postgresql.org/docs/9.1/static/app-psql.html

I‘v changed cmd (in win8) to encoding utf8 through chcp 65001, but error

still occurs.

And i use the following cmd to dump mysql data:
mysql> select Picture from personpicture where id =

'F2931306D1EE44ca82394CD3BC2404D4' into outfile

"d:\\1.txt" ;
I got the ansi file, and use Ultraedit to see first 16 bytes:
FF D8 FF E0 5C 30 10 4A 46 49 46 5C 30 01 01 5C

It's different from mysql workbench to see:
FF D8 FF E0 00 10 4a 46 49 46 00 01 01 00 00 01

Changing the terminal code page won't do anything, it's probably the data
that are in a different encoding.

I don't know enough about MySQL to know which encoding it uses when
dumping data,
but the man page of "mysqldump" tells me:

--set-charset
Add SET NAMES default_character_set to the output. This option is
enabled by default.

So is there a SET NAMES command in the dump? If yes, what is the argument?

You will have to tell PostgreSQL the encoding of the data.
As Kevin pointed out, you can do that by setting the environment variable
PGCLIENT ENCODING to the correct value. Then PostgreSQL will convert the
data automatically.

Yours,
Laurenz Albe

#6John R Pierce
pierce@hogranch.com
In reply to: sunpeng (#5)
Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

On 7/4/2014 2:12 AM, sunpeng wrote:

Thank you, friend, I use --hex-blob :
mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql
--default-character-set=utf8 --skip-add-locks --compact
--no-create-info --skip-quote-names --hex-blob -uroot -p test
videorecresult >dbdata.sql
to dump mysql data.
And replace blob data "0x...." into "E'\\xx....'" to load data into
postgresql.

regardless of all that, 0xFF is not a valid UTF8 character code. perhaps
you should store the data in a postgres BYTEA, or at least use character
encoding SQLASCII (which is to say, no encoding, bytes in == bytes out.)

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: John R Pierce (#6)
Re: Migration error " invalid byte sequence for encoding "UTF8": 0xff " from mysql 5.5 to postgresql 9.1

John R Pierce wrote:

On 7/4/2014 2:12 AM, sunpeng wrote:

Thank you, friend, I use --hex-blob :
mysqldump -v -nt --complete-insert=TRUE --compatible=postgresql
--default-character-set=utf8 --skip-add-locks --compact
--no-create-info --skip-quote-names --hex-blob -uroot -p test
videorecresult >dbdata.sql
to dump mysql data.
And replace blob data "0x...." into "E'\\xx....'" to load data into
postgresql.

regardless of all that, 0xFF is not a valid UTF8 character code. perhaps
you should store the data in a postgres BYTEA, or at least use character
encoding SQLASCII (which is to say, no encoding, bytes in == bytes out.)

Exactly.
According to mysqldump's man page, the affected field must be
BINARY, VARBINARY, the BLOB types or BIT.
For these PostgreSQL's "bytea" would definitely be the correct data type,
and there wouldn't be any encoding problems with that.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general