Invalid byte sequence errors on DB restore

Started by Samuel Smithabout 6 years ago6 messagesgeneral
Jump to latest
#1Samuel Smith
pgsql@net153.net

My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically just
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
and
invalid byte sequence for encoding "UTF8": 0xd7 0x20

This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Regards,
Samuel Smith

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Samuel Smith (#1)
Re: Invalid byte sequence errors on DB restore

On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote:

My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically just
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
and
invalid byte sequence for encoding "UTF8": 0xd7 0x20

This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Simple. Fix the offending string and upgrade.

Failure to enforce correct encoding is a bug in PostgreSQL, and a
number of such bugs have been fixed over the decades, so you might
be happier with a less ancient version.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Samuel Smith (#1)
Re: Invalid byte sequence errors on DB restore

On 3/15/20 9:18 PM, Samuel Smith wrote:

My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically just
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
 and
invalid byte sequence for encoding "UTF8": 0xd7 0x20

This is on a pg 9.2.24 instance. Any tips to troubleshoot?

What are the locale and encodings set to for the instance and databases
in it?

Regards,
Samuel Smith

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Samuel Smith
pgsql@net153.net
In reply to: Laurenz Albe (#2)
Re: Invalid byte sequence errors on DB restore

On 3/16/20 2:49 AM, Laurenz Albe wrote:

On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote:

My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically just
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
and
invalid byte sequence for encoding "UTF8": 0xd7 0x20

This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Simple. Fix the offending string and upgrade.

Failure to enforce correct encoding is a bug in PostgreSQL, and a
number of such bugs have been fixed over the decades, so you might
be happier with a less ancient version.

Yours,
Laurenz Albe

Yes I thought I had seen some info on bug fixes in newer versions. I
will pass this on to the system admins/stakeholders.

Thanks,
Samuel Smith

#5Samuel Smith
pgsql@net153.net
In reply to: Adrian Klaver (#3)
Re: Invalid byte sequence errors on DB restore

On 3/16/20 9:33 AM, Adrian Klaver wrote:

On 3/15/20 9:18 PM, Samuel Smith wrote:

My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically
just using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
  and
invalid byte sequence for encoding "UTF8": 0xd7 0x20

This is on a pg 9.2.24 instance. Any tips to troubleshoot?

What are the locale and encodings set to for the instance and databases
in it?

Regards,
Samuel Smith

The server is in UTF8. The file made with pg_dump used 'SQL_ASCII', but
setting it to UTF8 (via SET client_encoding ) did not help either.
Having the pg_dump encoding set to 'latin1' seems to allow the file
created it by it to be loaded via psql -f and everything seems to work.
Is there any bad side to setting the encoding on pg_dump to latin1?

For the record, the problem characters are:
https://www.htmlsymbols.xyz/unicode/U+00D7
and
https://www.htmlsymbols.xyz/unicode/U+00A0

But those characters were in many places and not all were issues. They
only fail depending on the characters that precede it which makes it
complicated.

Thanks,
Samuel Smith

#6Samuel Smith
pgsql@net153.net
In reply to: Samuel Smith (#1)
Re: Invalid byte sequence errors on DB restore

On 3/15/20 11:18 PM, Samuel Smith wrote:

My current DB backup routine is just to call pg_dump and pipe to gzip.
We recently started to get a failure to restore (which is basically just
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
 and
invalid byte sequence for encoding "UTF8": 0xd7 0x20

This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Regards,
Samuel Smith

Our issue actually turned out to be that a couple of our production
database had the encoding set to SQL_ASCII while all of our development
servers had UTF-8. This meant in some cases where we would restore a
production database into development (for testing or bug hunting), there
would be a failure to parse the backup file. A similar issue to this
blog post:
https://www.endpoint.com/blog/2017/07/21/postgres-migrating-sqlascii-to-utf-8

So our fix was to dump the affected production databases using the
LATIN1 encoding for pg_dump followed by destroying and recreating the
database and setting its new encoding to UTF-8. Then we could restore
the data using the pg_dump file with LATIN1 encoding.

Regards,
Samuel Smith