I cannot insert bengali character in UTF8
I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.
Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e
Item_summary is a text type column and we can insert japanese character in
this field.
Could anybody let me know what is the problem here?
On 20 July 2012 11:30, AI Rumman <rummandba@gmail.com> wrote:
I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62eItem_summary is a text type column and we can insert japanese character in
this field.Could anybody let me know what is the problem here?
Maybe they're not valid Bengali characters? Did you do an
encoding-naive truncation at some point?
My mail client cannot display the latter few characters before the
ellipsis, but can display the first few fine.
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
* AI Rumman wrote:
I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e
E0 A6 2E is not valid UTF-8: 11100000 10100110 00101110
The lead byte indicates that the codepoint consists of three bytes,
but only the very next byte is a trail byte (10......). The third
byte is a single character, a period ("."), to be exact.
Setting the MSB on the third byte gives us
11100000 10100110 10101110 = E0 A6 AE
, which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA.
Check your input data.
--
Christian
WOW. Great informative answer. Thanks.
On Fri, Jul 20, 2012 at 7:11 PM, Christian Ullrich <chris@chrullrich.net>wrote:
Show quoted text
* AI Rumman wrote:
I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62eE0 A6 2E is not valid UTF-8: 11100000 10100110 00101110
The lead byte indicates that the codepoint consists of three bytes,
but only the very next byte is a trail byte (10......). The third
byte is a single character, a period ("."), to be exact.Setting the MSB on the third byte gives us
11100000 10100110 10101110 = E0 A6 AE
, which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA.
Check your input data.
--
Christian