I cannot insert bengali character in UTF8

Started by AI Rummanover 13 years ago4 messagesgeneral

rummandba@gmail.com

over 13 years ago

I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.

Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e

Item_summary is a text type column and we can insert japanese character in
this field.

Could anybody let me know what is the problem here?

Peter Geoghegan

pg@bowt.ie

over 13 years ago

In reply to: AI Rumman (#1)

Re: I cannot insert bengali character in UTF8

On 20 July 2012 11:30, AI Rumman <rummandba@gmail.com> wrote:

I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.

Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e

Item_summary is a text type column and we can insert japanese character in
this field.

Could anybody let me know what is the problem here?

Maybe they're not valid Bengali characters? Did you do an
encoding-naive truncation at some point?

My mail client cannot display the latter few characters before the
ellipsis, but can display the first few fine.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Christian Ullrich

chris@chrullrich.net

over 13 years ago

In reply to: AI Rumman (#1)

Re: I cannot insert bengali character in UTF8

* AI Rumman wrote:

I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.

Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e

E0 A6 2E is not valid UTF-8: 11100000 10100110 00101110

The lead byte indicates that the codepoint consists of three bytes,
but only the very next byte is a trail byte (10......). The third
byte is a single character, a period ("."), to be exact.

Setting the MSB on the third byte gives us

11100000 10100110 10101110 = E0 A6 AE

, which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA.

Check your input data.

--
Christian

AI Rumman

rummandba@gmail.com

over 13 years ago

In reply to: Christian Ullrich (#3)

Re: I cannot insert bengali character in UTF8

WOW. Great informative answer. Thanks.

On Fri, Jul 20, 2012 at 7:11 PM, Christian Ullrich <chris@chrullrich.net>wrote:

Show quoted text

* AI Rumman wrote:

I am using database with UTF8 and LC_CTYPE set as default value in

Postgresql 9.1.
But I cannot insert bengali character in a column.

Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e

E0 A6 2E is not valid UTF-8: 11100000 10100110 00101110

The lead byte indicates that the codepoint consists of three bytes,
but only the very next byte is a trail byte (10......). The third
byte is a single character, a period ("."), to be exact.

Setting the MSB on the third byte gives us

11100000 10100110 10101110 = E0 A6 AE

, which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA.

Check your input data.

--
Christian