[TextSearch] syntax error while parsing affix file

Started by Daniel Chiaramelloover 17 years ago6 messagesgeneral
Jump to latest
#1Daniel Chiaramello
daniel.chiaramello@golog.net

Hello everybody.

I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell
dictionary (the OpenOffice one) for Textsearch features.

I converted the dictionary encoding to UTF-8, and I installed it in the
"tsearch_data" folder.

But when I try to create the dictionary, I have a syntax error:

CREATE TEXT SEARCH DICTIONARY bulgarian_ispell (
TEMPLATE = ispell,
DictFile = bulgarian_utf8,
AffFile = bulgarian_utf8,
StopWords = english
);
ERREUR: erreur de syntaxe
CONTEXTE : ligne 24 du fichier de configuration «
/usr/share/pgsql/tsearch_data/bulgarian_utf8.affix » : « . > А
»

(it means ERROR: syntax error, CONTEXT: line 24 of configuration file ...)

Extract of the file arount that line:

flag *A:
. > А (this is line 24)
. > АТА
. > И
. > ИТЕ

The file has Unix end_of_lines (I suspected something like that since
the "CONTEXT" error line was split on 2 lines).

I'm really lost on how I can go further with the bulgarian dictionary...
Could you help me, please?

Thanks for your attention!
Daniel Chiaramello

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Chiaramello (#1)
Re: [TextSearch] syntax error while parsing affix file

Daniel Chiaramello <daniel.chiaramello@golog.net> writes:

I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell
dictionary (the OpenOffice one) for Textsearch features.

I'm not an expert, but I think our ispell code supports only a subset of
the features that some other implementations have. So it doesn't
surprise me a lot that some configuration files don't work. You might
try one of the other sources for ispell files besides openoffice ---
see the links here:
http://developer.postgresql.org/pgdocs/postgres/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

regards, tom lane

#3Teodor Sigaev
teodor@sigaev.ru
In reply to: Daniel Chiaramello (#1)
Re: [TextSearch] syntax error while parsing affix file

I am using Postrges 8.3.5, and I am trying to install a bulgarian ISpell
dictionary (the OpenOffice one) for Textsearch features.

flag *A:
. > О©╫ (this is line 24)
. > О©╫О©╫О©╫
. > О©╫
. > О©╫О©╫О©╫

OpenOffice or ISpell? Pls, provide:
- link to download of dictionary
- Locale and encoding setting of your db

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#4Daniel Chiaramello
daniel.chiaramello@golog.net
In reply to: Teodor Sigaev (#3)
Re: [TextSearch] syntax error while parsing affix file

Teodor Sigaev a écrit :

I am using Postrges 8.3.5, and I am trying to install a bulgarian
ISpell dictionary (the OpenOffice one) for Textsearch features.

flag *A:
. > А (this is line 24)
. > АТА
. > И
. > ИТЕ

OpenOffice or ISpell? Pls, provide:
- link to download of dictionary
- Locale and encoding setting of your db

The dictionary is the ISpell one I got from
http://wiki.services.openoffice.org/wiki/Dictionaries list.

Here is a direct link for it:
http://heanet.dl.sourceforge.net/sourceforge/bgoffice/ispell-bg-4.1.tar.gz

I converted its encoding from windows-1251 to UTF-8 before running the
CREATE TEXT SEARCH DICTIONARY:

iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict
iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix

The locale of the database is fr_FR, and its encoding is UTF8.

Thanks!
Daniel

#5Teodor Sigaev
teodor@sigaev.ru
In reply to: Daniel Chiaramello (#4)
Re: [TextSearch] syntax error while parsing affix file

iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict
iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix

The locale of the database is fr_FR, and its encoding is UTF8.

I believe that characters 'О©╫', 'О©╫' (non-ascii) and other cyrillic ones are not
acceptable for french locale :(

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#6Daniel Chiaramello
daniel.chiaramello@golog.net
In reply to: Teodor Sigaev (#5)
Re: [TextSearch] syntax error while parsing affix file

Teodor Sigaev a écrit :

iconv -f windows-1251 -t utf-8 bulgarian.dic >bulgarian_utf8.dict
iconv -f windows-1251 -t utf-8 bulgarian.aff >bulgarian_utf8.affix

The locale of the database is fr_FR, and its encoding is UTF8.

I believe that characters 'И', 'А' (non-ascii) and other cyrillic ones
are not acceptable for french locale :(

I was able to install a thailandese dictionary - why would such
dictionary be ok and not a bulgarian one?
Which locale should I use to enable my database to be multi-language
compatible?

I would never have suspected a locale problem... Ouch!

Daniel