invalid byte sequence for encoding "UNICODE"

Started by AlannYover 17 years ago3 messagesgeneral
Jump to latest
#1AlannY
m@alanny.ru

Hi there.

Many times, I'm confronting with that strange problem: invalid byte
sequence for encoding "UNICODE". So, I guess, Postgresql can't allow me
to use some symbols which is not a part of UNICODE. But what is that
symbals?

I'm attaching a screenshot with THAT dead-symbol. As you can see - it's
an unknown symbol in the end of Cyrillic. First of all, I have checked
my data with iconv (iconv -f UTF-8 -t UTF-8 data.txt) and there are no
errors, so, I guess, there are no dead-symbols.

So the question is: is it possible to find a *table* with forbitten
characters for encoding "UNICODE"? If I can get it -> I can kill that
dead-characters in my program ;-)

Thank you.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: AlannY (#1)
Re: invalid byte sequence for encoding "UNICODE"

AlannY <m@alanny.ru> writes:

Many times, I'm confronting with that strange problem: invalid byte
sequence for encoding "UNICODE". So, I guess, Postgresql can't allow me
to use some symbols which is not a part of UNICODE. But what is that
symbals?

Doesn't it tell you? AFAICS every PG version that uses that error
message phrasing gives you the exact byte sequence it's complaining
about.

It would also be worth asking what PG version you are using anyway.
If it's not a pretty recent update then updating might help --- I
think there were some bugs in the encoding verification stuff awhile
back.

regards, tom lane

In reply to: AlannY (#1)
Re: invalid byte sequence for encoding "UNICODE"

On Jul 24, 8:06 pm, m...@alanny.ru (AlannY) wrote:

Hi there.

Many times, I'm confronting with that strange problem: invalid byte
sequence for encoding "UNICODE". So, I guess, Postgresql can't allow me
to use some symbols which is not a part of UNICODE. But what is that
symbals?

I'm attaching a screenshot with THAT dead-symbol. As you can see - it's
an unknown symbol in the end of Cyrillic. First of all, I have checked
my data with iconv (iconv -f UTF-8 -t UTF-8 data.txt) and there are no
errors, so, I guess, there are no dead-symbols.

So the question is: is it possible to find a *table* with forbitten
characters for encoding "UNICODE"? If I can get it -> I can kill that
dead-characters in my program ;-)

Thank you.

--
Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

To say the truth, there are no characters, forbidden in UNICODE as
there are no characters, that you can have, that are not in UNICODE.
The other thing is UTF8, that encodes real UNICODE into 8bit byte
sequence. There errors occur.

What does the command:

show lc_ctype;

show?

As Tom has said, more information about your system would be really
handy...

With best regards,

-- Valentine Gogichashvili