Problem loading ispell affix file with apostrophes
I'm having problem with french dictionaries. Loading an ispell affix
file with apostrophes does not work. The file comes from the ifrench
(french dict for ispell) debian source package at
http://packages.debian.org/sid/ifrench
Here's the session excerpt:
------------------------------------------------------------------------
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
dockee=# select plainto_tsquery('custom_french', 'bug');
ERROR: syntax error at line 158 of affix file
"/usr/share/postgresql/8.3/tsearch_data/ispell_french.affix"
dockee=# show lc_ctype ;
lc_ctype
-------------
en_US.UTF-8
(1 row)
dockee=# show client_encoding;
client_encoding
-----------------
UTF8
(1 row)
dockee=# show server_version;
server_version
----------------
8.3.3
(1 row)
------------------------------------------------------------------------
The 'custom_french' text configuration is defined as below:
------------------------------------------------------------------------
CREATE TEXT SEARCH CONFIGURATION public.custom_french ( COPY =
pg_catalog.french );
CREATE TEXT SEARCH DICTIONARY french_ispell (
TEMPLATE = ispell,
DictFile = ispell_french,
AffFile = ispell_french
);
ALTER TEXT SEARCH CONFIGURATION custom_french
ALTER MAPPING FOR
asciiword,
asciihword,
hword_asciipart,
word,
hword,
hword_part
WITH french_ispell ;
ALTER TEXT SEARCH CONFIGURATION custom_french
DROP MAPPING FOR
url,url_path,sfloat,float,file,int,version;
------------------------------------------------------------------------
Line 158 of file ispell_french.affix corresponds to the first flag
definition that triggers a prefix with an apostrophe, it's the line
below "flag *N"
------------------------------------------------------------------------
flag *D: # dé: défaire, dégrossir
. > dé
flag *N: # élision d'une négation
[aàâeèéêiîoôuh] > n' # je n'aime pas, il n'y a pas
------------------------------------------------------------------------
Maybe apostrophes in ispell affix files are simply not supported? I
can't find a mention of this limitation in the documentation at
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html
When commenting out the offending flag definitions, the affix file
loads successfully. Thanks in advance for helping me resolve this
problem.
--
Jean-Baptiste Quenot
http://jbq.caraldi.com/
"Jean-Baptiste Quenot" <jbq@caraldi.com> writes:
I'm having problem with french dictionaries. Loading an ispell affix
file with apostrophes does not work. The file comes from the ifrench
(french dict for ispell) debian source package at
http://packages.debian.org/sid/ifrench
dockee=# select plainto_tsquery('custom_french', 'bug');
ERROR: syntax error at line 158 of affix file
"/usr/share/postgresql/8.3/tsearch_data/ispell_french.affix"
Line 158 of file ispell_french.affix corresponds to the first flag
definition that triggers a prefix with an apostrophe, it's the line
below "flag *N"
------------------------------------------------------------------------
flag *D: # d�: d�faire, d�grossir
. > d�
flag *N: # �lision d'une n�gation
[a��e���i�o�uh] > n' # je n'aime pas, il n'y a pas
------------------------------------------------------------------------
Maybe apostrophes in ispell affix files are simply not supported?
Looking at the code, apostrophe seems to be allowed as the first
character of the REPL field, but not anywhere else (in particular,
not after transitioning into PAE_INREPL state). Dunno if this is
a bug or intentional.
regards, tom lane
------------------------------------------------------------------------
flag *D: # dО©╫: dО©╫faire, dО©╫grossir
. > dО©╫flag *N: # О©╫lision d'une nО©╫gation
[aО©╫О©╫eО©╫О©╫О©╫iО©╫oО©╫uh] > n' # je n'aime pas, il n'y a pas
------------------------------------------------------------------------Maybe apostrophes in ispell affix files are simply not supported?
Looking at the code, apostrophe seems to be allowed as the first
character of the REPL field, but not anywhere else (in particular,
not after transitioning into PAE_INREPL state). Dunno if this is
a bug or intentional.
Yeah, because original ispell tries to lexize words like "book's", but
apostrophe is a word-break character from our text parser. So, I just added this
special case to parser. But it seems to me we should allow apostrophe as word
character in replace field, may be in find field too.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/