UTF8 encoding problem

Started by Garry Saddingtonalmost 18 years ago7 messagesgeneral
Jump to latest
#1Garry Saddington
garry@schoolteachers.co.uk

I am getting illegal UTF8 encoding errors and I have traced it to the £ sign.
I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in postgresql.conf but
this has no effect. How can I sort this problem? Client_encoding =UTF8.
Regards
Garry

#2Michael Fuhr
mike@fuhr.org
In reply to: Garry Saddington (#1)
Re: UTF8 encoding problem

On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote:

I am getting illegal UTF8 encoding errors and I have traced it to the � sign.

What's the exact error message?

I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in postgresql.conf but
this has no effect. How can I sort this problem? Client_encoding =UTF8.

Is the data UTF-8? If the error is 'invalid byte sequence for encoding
"UTF8": 0xa3' then you probably need to set client_encoding to latin1,
latin9, or win1252.

--
Michael Fuhr

#3Giorgio Valoti
giorgio_v@mac.com
In reply to: Michael Fuhr (#2)
Re: UTF8 encoding problem

On 18/giu/08, at 03:04, Michael Fuhr wrote:

On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote:

I am getting illegal UTF8 encoding errors and I have traced it to
the £ sign.

What's the exact error message?

I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in
postgresql.conf but
this has no effect. How can I sort this problem? Client_encoding
=UTF8.

Is the data UTF-8? If the error is 'invalid byte sequence for
encoding
"UTF8": 0xa3' then you probably need to set client_encoding to latin1,
latin9, or win1252.

Why?

--
Giorgio Valoti

#4Garry Saddington
garry@schoolteachers.co.uk
In reply to: Michael Fuhr (#2)
Re: UTF8 encoding problem

On Wednesday 18 June 2008 02:04, Michael Fuhr wrote:

On Tue, Jun 17, 2008 at 10:48:34PM +0100, Garry Saddington wrote:

I am getting illegal UTF8 encoding errors and I have traced it to the £
sign.

What's the exact error message?

I have set lc_monetary to "lc_monetary = 'en_GB.UTF-8'" in
postgresql.conf but this has no effect. How can I sort this problem?
Client_encoding =UTF8.

Is the data UTF-8? If the error is 'invalid byte sequence for encoding
"UTF8": 0xa3' then you probably need to set client_encoding to latin1,
latin9, or win1252.

Thanks, that's fixed it.
Garry

#5Michael Fuhr
mike@fuhr.org
In reply to: Giorgio Valoti (#3)
Re: UTF8 encoding problem

On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:

On 18/giu/08, at 03:04, Michael Fuhr wrote:

Is the data UTF-8? If the error is 'invalid byte sequence for
encoding "UTF8": 0xa3' then you probably need to set client_encoding
to latin1, latin9, or win1252.

Why?

UTF-8 has rules about what byte values can occur in sequence;
violations of those rules mean that the data isn't valid UTF-8.
This particular error says that the database received a byte with
the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.

The UTF-8 byte sequence for the pound sign (�) is 0xc2 0xa3. If
Garry got this error (I don't know if he did; I was asking) then
the byte 0xa3 must have appeared in some other sequence that wasn't
valid UTF-8. The usual reason for that is that the data is in some
encoding other than UTF-8.

Common encodings for Western European languages are Latin-1
(ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three
of these encodings use a lone 0xa3 to represent the pound sign. If
the data has a pound sign as 0xa3 and the database complains that
it isn't part of a valid UTF-8 sequence then the data is likely to
be in one of these other encodings.

--
Michael Fuhr

#6Garry Saddington
garry@schoolteachers.co.uk
In reply to: Michael Fuhr (#5)
Re: UTF8 encoding problem

On Wednesday 18 June 2008 14:00, Michael Fuhr wrote:

On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:

On 18/giu/08, at 03:04, Michael Fuhr wrote:

Is the data UTF-8? If the error is 'invalid byte sequence for
encoding "UTF8": 0xa3' then you probably need to set client_encoding
to latin1, latin9, or win1252.

Why?

UTF-8 has rules about what byte values can occur in sequence;
violations of those rules mean that the data isn't valid UTF-8.
This particular error says that the database received a byte with
the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.

The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If
Garry got this error (I don't know if he did; I was asking) then
the byte 0xa3 must have appeared in some other sequence that wasn't
valid UTF-8. The usual reason for that is that the data is in some
encoding other than UTF-8.

Common encodings for Western European languages are Latin-1
(ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three
of these encodings use a lone 0xa3 to represent the pound sign. If
the data has a pound sign as 0xa3 and the database complains that
it isn't part of a valid UTF-8 sequence then the data is likely to
be in one of these other encodings.

Thanks, I have traced it to a client_encoding problem and set it to latin1
which has cured the problem.
regards
garry

#7Giorgio Valoti
giorgio_v@mac.com
In reply to: Michael Fuhr (#5)
Re: UTF8 encoding problem

On 18/giu/08, at 15:00, Michael Fuhr wrote:

On Wed, Jun 18, 2008 at 08:25:07AM +0200, Giorgio Valoti wrote:

On 18/giu/08, at 03:04, Michael Fuhr wrote:

Is the data UTF-8? If the error is 'invalid byte sequence for
encoding "UTF8": 0xa3' then you probably need to set client_encoding
to latin1, latin9, or win1252.

Why?

UTF-8 has rules about what byte values can occur in sequence;
violations of those rules mean that the data isn't valid UTF-8.
This particular error says that the database received a byte with
the value 0xa3 (163) in a sequence of bytes that wasn't valid UTF-8.

The UTF-8 byte sequence for the pound sign (£) is 0xc2 0xa3. If
Garry got this error (I don't know if he did; I was asking) then
the byte 0xa3 must have appeared in some other sequence that wasn't
valid UTF-8. The usual reason for that is that the data is in some
encoding other than UTF-8.

Common encodings for Western European languages are Latin-1
(ISO-8859-1), Latin-9 (ISO-8859-15), and Windows-1252. All three
of these encodings use a lone 0xa3 to represent the pound sign. If
the data has a pound sign as 0xa3 and the database complains that
it isn't part of a valid UTF-8 sequence then the data is likely to
be in one of these other encodings.

Much clearer now, thank you Michael.

--
Giorgio Valoti