ERROR: invalid byte sequence for encoding "UTF8": 0xc35c

Started by AI Rummanabout 15 years ago5 messagesgeneral
Jump to latest
#1AI Rumman
rummandba@gmail.com

I am getting error in Postgresql 9.0.1.

update import_details_test
set data_row = '["4","1 Monor JoÃ\u083ão S. AntÃ\u0083ão (Schools
21,22,76)( Ru)","http://www.asdas.aa.nj.us","","908 436
4861","","","--None--","",","","--None--","","","0","","--None--","0","2008-12-29
17:53:08","","2010-08-23 14:58:56","0","Vantage Learning-OTS","
hsmith@vantage.com","","","","","","","","","","","","","","0","","","","","","","0","","300
S BROAD
ST","","","","ELIZABETH","","NJ","","07208","","","","","UNION","0","School","0","Tier
II","","0","","","","","ELIZABETH CITY","Elementary
School","Public","500","0","0","0","Suburban","","--None--","","","","0","","","","","","","","","","","","","--None--","0","","","","","","","","","","","","","","","","0","0","","0","0","0","--None--","--None--","","","","","0","","","0","","0","","","","0","","","","","","","","","","","","","","","","","","","","","0","","","","","","","","",""]'
WHERE import_queue_id = '35' AND record_num = 181;
ERROR: invalid byte sequence for encoding "UTF8": 0xc35c

Any idea please.

#2Scott Ribe
scott_ribe@elevated-dev.com
In reply to: AI Rumman (#1)
Re: ERROR: invalid byte sequence for encoding "UTF8": 0xc35c

On Feb 27, 2011, at 5:47 AM, AI Rumman wrote:

Any idea please.

Don't do that ;-)

Seriously, the error means exactly what it says, so you have to figure out why your app is trying to insert invalid UTF-8.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

#3Craig Ringer
craig@2ndquadrant.com
In reply to: AI Rumman (#1)
Re: ERROR: invalid byte sequence for encoding "UTF8": 0xc35c

On 27/02/11 20:47, AI Rumman wrote:

I am getting error in Postgresql 9.0.1.

update import_details_test
set data_row = '["4","1 Monor Jo�\u083�£o S. Ant�\u0083�£o

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Because your email client may have transformed the text encoding, I
can't make any certain conclusions about what you're actually sending to
the database, but it's highly likely that you're sending latin-1 encoded
text to the database while your client_encoding is set to 'utf8'.

The marked text is most likely the problem... but I think there's more
wrong with it than just being latin-1 encoded. That kind of mangling
often comes about when utf-8 text has been incorrectly interpreted as
latin-1 and modified, or when something has incorrectly tried to do
utf8<->latin-1 conversions more than once. You really need to figure out
what encoding your input is in, convert it to a known encoding like
utf-8 *once*, and keep it that way.

If you're using Python, which I suspect you might be, the "".decode()
function is useful. For example, I can convert a latin-1 encoded byte
string to a python Unicode string with:

"somelatin1string".decode("latin-1")

Sometimes you can get away with just "SET client_encoding=latin-1" but
in this case your string data looks like it's been mangled by more than
just a single encoding mis-interpretation, so you'll probably just
silently insert corrupt data by doing that. Don't. Fix your code so it
knows what the text encoding of the input is.

If you are, in fact, using Python, it's a really good idea to always
"".decode() all your inputs so your internal processing is done in
Unicode (UTF-16, in fact). Similarly, Qt programmers should convert
everything to unicode QString as soon as possible and use that for all
internal manipulation. It'll save a lot of pain.

#4Jasmin Dizdarevic
jasmin.dizdarevic@gmail.com
In reply to: Craig Ringer (#3)
Re: ERROR: invalid byte sequence for encoding "UTF8": 0xc35c

@ALL: Isn't it possible and wise to include an (optional) encoder in pgsql?

we're importing a lot of data from textfiles, which are not utf-8. we always
have to change the encoding in another tool before using COPY.

2011/2/28 Craig Ringer <craig@postnewspapers.com.au>

Show quoted text

On 27/02/11 20:47, AI Rumman wrote:

I am getting error in Postgresql 9.0.1.

update import_details_test
set data_row = '["4","1 Monor JoÃ\u083ão S. AntÃ\u0083ão

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Because your email client may have transformed the text encoding, I
can't make any certain conclusions about what you're actually sending to
the database, but it's highly likely that you're sending latin-1 encoded
text to the database while your client_encoding is set to 'utf8'.

The marked text is most likely the problem... but I think there's more
wrong with it than just being latin-1 encoded. That kind of mangling
often comes about when utf-8 text has been incorrectly interpreted as
latin-1 and modified, or when something has incorrectly tried to do
utf8<->latin-1 conversions more than once. You really need to figure out
what encoding your input is in, convert it to a known encoding like
utf-8 *once*, and keep it that way.

If you're using Python, which I suspect you might be, the "".decode()
function is useful. For example, I can convert a latin-1 encoded byte
string to a python Unicode string with:

"somelatin1string".decode("latin-1")

Sometimes you can get away with just "SET client_encoding=latin-1" but
in this case your string data looks like it's been mangled by more than
just a single encoding mis-interpretation, so you'll probably just
silently insert corrupt data by doing that. Don't. Fix your code so it
knows what the text encoding of the input is.

If you are, in fact, using Python, it's a really good idea to always
"".decode() all your inputs so your internal processing is done in
Unicode (UTF-16, in fact). Similarly, Qt programmers should convert
everything to unicode QString as soon as possible and use that for all
internal manipulation. It'll save a lot of pain.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Jasmin Dizdarevic (#4)
Re: ERROR: invalid byte sequence for encoding "UTF8": 0xc35c

On 03/03/11 09:18, Jasmin Dizdarevic wrote:

@ALL: Isn't it possible and wise to include an (optional) encoder in pgsql?

we're importing a lot of data from textfiles, which are not utf-8. we
always have to change the encoding in another tool before using COPY.

SET client_encoding

--
Craig Ringer