Unicode Corruption and upgrading to 8.0.4. to 8.1

Started by Howard Coleover 20 years ago6 messagesgeneral
Jump to latest
#1Howard Cole
howardnews@selestial.com

Hi everyone, I have a problem with corrupt UTF-8 sequences in my 8.0.4
dump which is preventing me from upgrading to 8.1 - which spots the
errors and refuses to import the data. Is there some SQL command that I
can use to fix or cauterise the sequences in the 8.0.4 database before
dumping to 8.1?

I think the problem arose using invalid client encodings - which were
not rejected prior to 8.1.

Regards,

Howard Cole
www.selestial.com

#2Zlatko Matić
zlatko.matic1@sb.t-com.hr
In reply to: Howard Cole (#1)
Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

Have you tried to restore just schema first, then data?
Greetings,

Zlatko

----- Original Message -----
From: "Howard Cole" <howardnews@selestial.com>
To: "'PgSql General'" <pgsql-general@postgresql.org>
Sent: Friday, December 02, 2005 3:02 PM
Subject: [GENERAL] Unicode Corruption and upgrading to 8.0.4. to 8.1

Show quoted text

Hi everyone, I have a problem with corrupt UTF-8 sequences in my 8.0.4
dump which is preventing me from upgrading to 8.1 - which spots the
errors and refuses to import the data. Is there some SQL command that I
can use to fix or cauterise the sequences in the 8.0.4 database before
dumping to 8.1?

I think the problem arose using invalid client encodings - which were
not rejected prior to 8.1.

Regards,

Howard Cole
www.selestial.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#3Howard Cole
howardnews@selestial.com
In reply to: Zlatko Matić (#2)
Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

Hi Zlatko,

I shall give this a try later and let you know how I get on. Thank you
for responding.

Howard.

Zlatko Matic wrote:

Show quoted text

Have you tried to restore just schema first, then data?
Greetings,

Zlatko

Hi everyone, I have a problem with corrupt UTF-8 sequences in my
8.0.4 dump which is preventing me from upgrading to 8.1 - which spots
the errors and refuses to import the data. Is there some SQL command
that I can use to fix or cauterise the sequences in the 8.0.4
database before dumping to 8.1?

I think the problem arose using invalid client encodings - which were
not rejected prior to 8.1.

#4Markus Wollny
Markus.Wollny@computec.de
In reply to: Howard Cole (#3)
Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

Hello!

-----Ursprüngliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] Im Auftrag von Howard Cole
Gesendet: Dienstag, 6. Dezember 2005 13:41
An: 'PgSql General'
Betreff: Re: [GENERAL] Unicode Corruption and upgrading to
8.0.4. to 8.1

Hi everyone, I have a problem with corrupt UTF-8 sequences in my
8.0.4 dump which is preventing me from upgrading to 8.1 -

which spots

the errors and refuses to import the data. Is there some

SQL command

that I can use to fix or cauterise the sequences in the 8.0.4
database before dumping to 8.1?

I think the problem arose using invalid client encodings -

which were

not rejected prior to 8.1.

We experienced the exact same problems. You may solve the problem by feeding the dump through iconv. See my earlier message on this issue

http://archives.postgresql.org/pgsql-general/2005-11/msg00799.php

On top of that you'd be well advised to try dumping using pg_dump of postgresql 8.1.

Kind regards

Markus

#5Howard Cole
howardnews@selestial.com
In reply to: Markus Wollny (#4)
Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

Thanks Markus,

I am avoiding this solution at the moment since the database contains
binary (ByteA) fields aswell as text fields and I am unsure what iconv
would do to this data. If Zlatko's method does not work then I shall see
if I can programmatically use libiconv for all the relevant data.

Regards,

Howard Cole
Markus Wollny wrote:

Show quoted text

message on this issue

http://archives.postgresql.org/pgsql-general/2005-11/msg00799.php

On top of that you'd be well advised to try dumping using pg_dump of postgresql 8.1.

#6Markus Wollny
Markus.Wollny@computec.de
In reply to: Howard Cole (#5)
Re: Unicode Corruption and upgrading to 8.0.4. to 8.1

Hi!

-----Ursprüngliche Nachricht-----
Von: Howard Cole [mailto:howardnews@selestial.com]
Gesendet: Dienstag, 6. Dezember 2005 15:38
An: Markus Wollny
Cc: PgSql General
Betreff: Re: [GENERAL] Unicode Corruption and upgrading to
8.0.4. to 8.1

I am avoiding this solution at the moment since the database
contains binary (ByteA) fields aswell as text fields and I am
unsure what iconv would do to this data.

Bytea-data in a plain text dump should be quite safe from iconv, as all the problematic characters (decimal value <32 or >126) in the binary string are represented as SQL escaped octets like \###.

Kind regards

Markus