ERROR: translation failed from server encoding to wchar_t

Started by Nonameabout 18 years ago8 messages
#1Noname
ilanco@gmail.com

I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"

My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').

In 8.1 with tsearch2 it worked perfectly ...

Thanks for you help,

ilan

#2Noname
ilanco@gmail.com
In reply to: Noname (#1)
Re: ERROR: translation failed from server encoding to wchar_t

http://pastebin.ca/845670

This url provides a testcase ...

fisrt pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_t

second pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1

The to_tsvector seems to accept the row at random ...

Show quoted text

On Jan 7, 9:16 pm, ila...@gmail.com wrote:

I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"

My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').

In 8.1 with tsearch2 it worked perfectly ...

Thanks for you help,

ilan

#3Noname
ilanco@gmail.com
In reply to: Noname (#1)
Re: ERROR: translation failed from server encoding to wchar_t

Found something interesting with this testcase.
update fails after SELECT query.
Can anyone confirm this ???

dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname=> select * from dbmail_messageblks where messageblk_idnr =
12949;
 messageblk_idnr | physmessage_id
|
messageblk
| blocksize | is_header |
idxfti
-----------------+----------------
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+-----------+-----------
+-------------------------------------------------------------------------------------------------
           12949 |           6319 | l'\351quipe de Casinos-park a bien
re\347u votre messsage. \012\012Vous aurez une r\351ponse d\350s que
l'un de nos responsables aura pris connaissance de votre envoi.
\012\012cordialement\012\012l'\351quipe de casinos-park.
\012\012====================\012\012The team of Casinos-park received
your messsage.\012\012You will have an answer as soon as one of our
persons in charge takes note of your sending. \012\012Best regards
\012\012The team of casinos-park.\012 |       398 |         0 | '=':3
'e':5 'h':11 'i':2,10 'k':12 'l':1 'o':7 'p':9,16 'r':8 's':6 't':13
'u':15 'y':14 'rk':4
(1 row)

dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_t
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1

Show quoted text

On Jan 7, 10:21 pm, ila...@gmail.com wrote:

http://pastebin.ca/845670

This url provides a testcase ...

fisrt pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_t

second pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1

The to_tsvector seems to accept the row at random ...

On Jan 7, 9:16 pm, ila...@gmail.com wrote:

I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"

My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').

In 8.1 with tsearch2 it worked perfectly ...

Thanks for you help,

ilan

#4Noname
ilanco@gmail.com
In reply to: Noname (#1)
Re: ERROR: translation failed from server encoding to wchar_t

http://pastebin.ca/845696
added schema ...

Show quoted text

On Jan 7, 10:21 pm, ila...@gmail.com wrote:

http://pastebin.ca/845670

This url provides a testcase ...

fisrt pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_t

second pass :
=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1

The to_tsvector seems to accept the row at random ...

On Jan 7, 9:16 pm, ila...@gmail.com wrote:

I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"

My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').

In 8.1 with tsearch2 it worked perfectly ...

Thanks for you help,

ilan

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: ERROR: translation failed from server encoding to wchar_t

ilanco@gmail.com writes:

I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"

My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').

Two likely theories:

1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.

2. The encode() is yielding something that isn't valid UTF-8.

PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.

regards, tom lane

#6Noname
ilanco@gmail.com
In reply to: Noname (#1)
Re: ERROR: translation failed from server encoding to wchar_t

On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote:

ila...@gmail.com writes:

I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"
My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').

Two likely theories:

1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.

2. The encode() is yielding something that isn't valid UTF-8.

PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majord...@postgresql.org so that your
message can get through to the mailing list cleanly

Dear Tom,

Thanks for your reply.
This is the output of `locale` on my system :
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

As for your second scenario I guess you are right, it's possible
encode does not return all UTF8 characters.
But to_tsvector() succeeds and fails at random with this kind of
characters...
So how can I sanitize output from encode before I pipe it to
to_tsvector() ?

Regards,

Ilan

#7Noname
ilanco@gmail.com
In reply to: Noname (#1)
Re: ERROR: translation failed from server encoding to wchar_t

On Jan 8, 10:43 am, ila...@gmail.com wrote:

On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote:

ila...@gmail.com writes:

I am using tsearch2 with pgsql 8.2.5 and get the following error when
calling to_tsvector :
"translation failed from server encoding to wchar_t"
My database is UTF8 encoded and the data sent to to_tsvector comes
from a bytea column converted to text with
encode(COLUMN, 'escape').

Two likely theories:

1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.

2. The encode() is yielding something that isn't valid UTF-8.

PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majord...@postgresql.org so that your
message can get through to the mailing list cleanly

Dear Tom,

Thanks for your reply.
This is the output of `locale` on my system :
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

As for your second scenario I guess you are right, it's possible
encode does not return all UTF8 characters.
But to_tsvector() succeeds and fails at random with this kind of
characters...
So how can I sanitize output from encode before I pipe it to
to_tsvector() ?

Regards,

Ilan

Tom,

To get around the non-UTF8 chars I used following function :

CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS "trigger"
AS $$
DECLARE
BEGIN
RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr;
BEGIN
NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
NEW.messageblk, 'escape'));
RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr;
RETURN NEW;
EXCEPTION
WHEN character_not_in_repertoire THEN
RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %',
NEW.messageblk_idnr;
NEW.idxFTI := to_tsvector('simple',
'character_not_in_repertoire: This email contains illegal
characters.');
RETURN NEW;
END;
END;
$$
LANGUAGE plpgsql;

Hope this helps others with DBmail and tsearch2 on postgres 8.2

Thanks for your help Tom,

ilan

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Noname (#7)
Re: ERROR: translation failed from server encoding to wchar_t

ilanco@gmail.com wrote:

NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
NEW.messageblk, 'escape'));

I strongly doubt that this does what you think it does - I would check
the results if I were you. The $x$E$x$ should almost certainly not be
there - if you are trying to get E'foo' behaviour, that is purely for
literals. All you are doing here is to prepend a literal 'E' to your value.

cheers

andrew