ERROR: translation failed from server encoding to wchar_t
I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"
My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').
In 8.1 with tsearch2 it worked perfectly ...
Thanks for you help,
ilan
This url provides a testcase ...
fisrt pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_t
second pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
The to_tsvector seems to accept the row at random ...
Show quoted text
On Jan 7, 9:16 pm, ila...@gmail.com wrote:
I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').In 8.1 with tsearch2 it worked perfectly ...
Thanks for you help,
ilan
Found something interesting with this testcase.
update fails after SELECT query.
Can anyone confirm this ???
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname=> select * from dbmail_messageblks where messageblk_idnr =
12949;
messageblk_idnr | physmessage_id
|
messageblk
| blocksize | is_header |
idxfti
-----------------+----------------
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+-----------+-----------
+-------------------------------------------------------------------------------------------------
12949 | 6319 | l'\351quipe de Casinos-park a bien
re\347u votre messsage. \012\012Vous aurez une r\351ponse d\350s que
l'un de nos responsables aura pris connaissance de votre envoi.
\012\012cordialement\012\012l'\351quipe de casinos-park.
\012\012====================\012\012The team of Casinos-park received
your messsage.\012\012You will have an answer as soon as one of our
persons in charge takes note of your sending. \012\012Best regards
\012\012The team of casinos-park.\012 | 398 | 0 | '=':3
'e':5 'h':11 'i':2,10 'k':12 'l':1 'o':7 'p':9,16 'r':8 's':6 't':13
'u':15 'y':14 'rk':4
(1 row)
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_t
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
Show quoted text
On Jan 7, 10:21 pm, ila...@gmail.com wrote:
This url provides a testcase ...
fisrt pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_tsecond pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1The to_tsvector seems to accept the row at random ...
On Jan 7, 9:16 pm, ila...@gmail.com wrote:
I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').In 8.1 with tsearch2 it worked perfectly ...
Thanks for you help,
ilan
http://pastebin.ca/845696
added schema ...
Show quoted text
On Jan 7, 10:21 pm, ila...@gmail.com wrote:
This url provides a testcase ...
fisrt pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_tsecond pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1The to_tsvector seems to accept the row at random ...
On Jan 7, 9:16 pm, ila...@gmail.com wrote:
I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').In 8.1 with tsearch2 it worked perfectly ...
Thanks for you help,
ilan
ilanco@gmail.com writes:
I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"
My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').
Two likely theories:
1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.
2. The encode() is yielding something that isn't valid UTF-8.
PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.
regards, tom lane
On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote:
ila...@gmail.com writes:
I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"
My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').Two likely theories:
1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.2. The encode() is yielding something that isn't valid UTF-8.
PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majord...@postgresql.org so that your
message can get through to the mailing list cleanly
Dear Tom,
Thanks for your reply.
This is the output of `locale` on my system :
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
As for your second scenario I guess you are right, it's possible
encode does not return all UTF8 characters.
But to_tsvector() succeeds and fails at random with this kind of
characters...
So how can I sanitize output from encode before I pipe it to
to_tsvector() ?
Regards,
Ilan
On Jan 8, 10:43 am, ila...@gmail.com wrote:
On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote:
ila...@gmail.com writes:
I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"
My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').Two likely theories:
1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.2. The encode() is yielding something that isn't valid UTF-8.
PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majord...@postgresql.org so that your
message can get through to the mailing list cleanlyDear Tom,
Thanks for your reply.
This is the output of `locale` on my system :
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=As for your second scenario I guess you are right, it's possible
encode does not return all UTF8 characters.
But to_tsvector() succeeds and fails at random with this kind of
characters...
So how can I sanitize output from encode before I pipe it to
to_tsvector() ?Regards,
Ilan
Tom,
To get around the non-UTF8 chars I used following function :
CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS "trigger"
AS $$
DECLARE
BEGIN
RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr;
BEGIN
NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
NEW.messageblk, 'escape'));
RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr;
RETURN NEW;
EXCEPTION
WHEN character_not_in_repertoire THEN
RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %',
NEW.messageblk_idnr;
NEW.idxFTI := to_tsvector('simple',
'character_not_in_repertoire: This email contains illegal
characters.');
RETURN NEW;
END;
END;
$$
LANGUAGE plpgsql;
Hope this helps others with DBmail and tsearch2 on postgres 8.2
Thanks for your help Tom,
ilan
ilanco@gmail.com wrote:
NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
NEW.messageblk, 'escape'));
I strongly doubt that this does what you think it does - I would check
the results if I were you. The $x$E$x$ should almost certainly not be
there - if you are trying to get E'foo' behaviour, that is purely for
literals. All you are doing here is to prepend a literal 'E' to your value.
cheers
andrew