Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

Started by NTPTover 10 years ago7 messagesgeneral
Jump to latest
#1NTPT
NTPT@seznam.cz

Hi, all.

I need help.

pg_exec(): Query failed: ERROR: character with byte sequence 0xc2 0x96 in
encoding "UTF8" has no equivalent in encoding "WIN1250"

It is a strange. First there was a database with latin2 encoding.

to this database connect an aplicaton with "set client encoding to win1250"
and manipulating data

then database was dumped with pg_dump -E UTF8

then database was restored pg_restore on another cluster in database with
UTF8 encoding

then application connect to new database with "set client encoding to win
1250"

and - query failed

How in this scenario could invaid  characters reach the database ???

And how to solve this ? Errort message is not very useful, because does not
provide any hint (at least column and row)

=

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: NTPT (#1)
Re: Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

NTPT wrote:

I need help.

pg_exec(): Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no
equivalent in encoding "WIN1250"

It is a strange. First there was a database with latin2 encoding.

to this database connect an aplicaton with "set client encoding to win1250" and manipulating data

then database was dumped with pg_dump -E UTF8

then database was restored pg_restore on another cluster in database with UTF8 encoding

then application connect to new database with "set client encoding to win1250"

and - query failed

How in this scenario could invaid characters reach the database ???

And how to solve this ? Errort message is not very useful, because does not provide any hint (at least
column and row)

I can reproduce that, and I think it is a bug.

Hex 96 is Unicode Code Point 2013 in Windows-1250, that is an "en dash".

1) You enter this byte into a Latin 2 database with client_encoding WIN1250,
and it gets stored as hex 96 in the database.

2) You dump this database with -E UTF8 and get hex C2 96 in the dump.

3) You restore this database to a new UTF8 database, the data end up
as hex C2 96.

4) You query with client_encoding WIN1250 and get the error you quote.

Now I think that the bug is in step 1).
Wikipedia says that hex 96 is undefined in Latin 2
(https://en.wikipedia.org/wiki/ISO/IEC_8859-2),
so instead of storing this byte, PostgreSQL should have complained that it
cannot be converted to Latin 2, since indeed there is no "em dash" defined
in Latin 2.

The bug seems to be in
backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c,
function win12502mic().
I think that the entries in win1250_2_iso88592 corresponding to undefined characters
should be 0x00 to produce an error.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3NTPT
NTPT@seznam.cz
In reply to: NTPT (#1)
Re: [GENERAL] Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

Hi,

Thanx for explanation.

but how to solve this  in existing  database ?

---------- Původní zpráva ----------
Od: Albe Laurenz <laurenz.albe@wien.gv.at>
Komu: 'NTPT *EXTERN*' <NTPT@seznam.cz>, pgsql-general@postgresql.org <pgsql-
general@postgresql.org>
Datum: 25. 11. 2015 12:54:17
Předmět: Re: [GENERAL] Query failed: ERROR: character with byte sequence 0xc
2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

"NTPT wrote:

I need help.

pg_exec(): Query failed: ERROR: character with byte sequence 0xc2 0x96 in

encoding "UTF8" has no

equivalent in encoding "WIN1250"

It is a strange. First there was a database with latin2 encoding.

to this database connect an aplicaton with "set client encoding to win

1250" and manipulating data

then database was dumped with pg_dump -E UTF8

then database was restored pg_restore on another cluster in database with

UTF8 encoding

then application connect to new database with "set client encoding to win

1250"

and - query failed

How in this scenario could invaid characters reach the database ???

And how to solve this ? Errort message is not very useful, because does

not provide any hint (at least

column and row)

I can reproduce that, and I think it is a bug.

Hex 96 is Unicode Code Point 2013 in Windows-1250, that is an "en dash".

1) You enter this byte into a Latin 2 database with client_encoding WIN1250,
and it gets stored as hex 96 in the database.

2) You dump this database with -E UTF8 and get hex C2 96 in the dump.

3) You restore this database to a new UTF8 database, the data end up
as hex C2 96.

4) You query with client_encoding WIN1250 and get the error you quote.

Now I think that the bug is in step 1).
Wikipedia says that hex 96 is undefined in Latin 2
(https://en.wikipedia.org/wiki/ISO/IEC_8859-2),
so instead of storing this byte, PostgreSQL should have complained that it
cannot be converted to Latin 2, since indeed there is no "em dash" defined
in Latin 2.

The bug seems to be in
backend/utils/mb/conversion_procs/latin2_and_win1250/latin2_and_win1250.c,
function win12502mic().
I think that the entries in win1250_2_iso88592 corresponding to undefined
characters
should be 0x00 to produce an error.

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general&quot;=

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: NTPT (#3)
Re: Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

NTPT wrote:

but how to solve this in existing database ?

Either update all affected columns in the source database
or edit the database dump...

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5NTPT
NTPT@seznam.cz
In reply to: NTPT (#1)
RE: [GENERAL] Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

but how to update affected columns ? error message does not  provide single 
clue ( at least row name)

And dump-restore ?  It  do not underestand how it could help.. dumped as
unicode  restore as unicode = I am at the same point ... dumping as latin2
and restore to utf8 will end with the some errors..  I suspect

---------- Původní zpráva ----------
Od: Albe Laurenz <laurenz.albe@wien.gv.at>
Komu: 'NTPT *EXTERN*' <NTPT@seznam.cz>
Datum: 25. 11. 2015 14:47:46
Předmět: RE: [GENERAL] Query failed: ERROR: character with byte sequence 0xc
2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

"NTPT wrote:

but how to solve this in existing database ?

Either update all affected columns in the source database
or edit the database dump...

Yours,
Laurenz Albe"=

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: NTPT (#5)
Re: Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

I realize I was too short, sorry.

NTPT wrote:

but how to update affected columns ? error message does not provide single clue ( at least row name)

For every table and every column in the source database that might be
affected, try something like:

SELECT id, col FROM tab WHERE col LIKE E'%\x96%';

Then issue UPDATE statements for the affected rows, e.g. replacing "–" with "-".

And dump-restore ? It do not underestand how it could help.. dumped as unicode restore as unicode =
I am at the same point ... dumping as latin2 and restore to utf8 will end with the some errors.. I
suspect

You can run something like this over the plain text dump:

sed -e 's/–/-/g' dump.sql >fixed.sql

Of course there might be other windows characters lurking ...

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#6)
Re: Query failed: ERROR: character with byte sequence 0xc2 0x96 in encoding "UTF8" has no equivalent in encoding "WIN1250"

I wrote:

You can run something like this over the plain text dump:

sed -e 's/–/-/g' dump.sql >fixed.sql

Scrap that, it should of course be:

sed -e 's/–/-/g' dump.sql >fixed.sql

Yours,
Laurenz Albe

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general