Text Search Configuration Problem

Started by Kevin Reynoldsabout 18 years ago3 messagesgeneral
Jump to latest
#1Kevin Reynolds
kreynolds98092@yahoo.com

I'm using Postgresql version 8.3.1 on CentOS 5 and am following the steps in section 12.7 of the documentation for creating a custom text search configuration.

When I get to the step that says:

CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);

I get the following error:

ERROR: invalid byte sequence for encoding "UTF8": 0xe0c020
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

I'm using the english ispell files from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Does anyone know how to solve this?

---------------------------------
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Reynolds (#1)
Re: Text Search Configuration Problem

Kevin Reynolds <kreynolds98092@yahoo.com> writes:

I get the following error:

ERROR: invalid byte sequence for encoding "UTF8": 0xe0c020
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

I'm using the english ispell files from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Are you sure those are in UTF8 encoding?

regards, tom lane

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Kevin Reynolds (#1)
Re: Text Search Configuration Problem

Kevin,

it looks like you use UTF-8, so the problem in .aff file, which contains
cyrillic comments :) I converted files into UTF-8 encoding using iconv.

Oleg

On Thu, 3 Apr 2008, Kevin Reynolds wrote:

I'm using Postgresql version 8.3.1 on CentOS 5 and am following the steps in section 12.7 of the documentation for creating a custom text search configuration.

When I get to the step that says:

CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = english,
AffFile = english,
StopWords = english
);

I get the following error:

ERROR: invalid byte sequence for encoding "UTF8": 0xe0c020
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

I'm using the english ispell files from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/

Does anyone know how to solve this?

---------------------------------
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83