fulltext search stemming/ spelling problems

Started by Corinabout 16 years ago5 messagesgeneral
Jump to latest
#1Corin
wakathane@gmail.com

Hi!

I'm using postgres 8.4.3 and try to get stemming/ wrong word correction
working.

I already installed the myspell dictionaries using apt-get and created
postgres dictionaries like this:

Fulltext search configuration �public.english_ispell�
Parser: �pg_catalog.default�
Token | Dictionaries
-----------------+------------------------------------
asciihword | english_ispell,english_stem,simple
asciiword | english_ispell,english_stem,simple
email | simple
file | simple
float | simple
host | simple
hword | english_ispell,english_stem,simple
hword_asciipart | english_ispell,english_stem,simple
hword_numpart | simple
hword_part | english_ispell,english_stem,simple
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_ispell,english_stem,simple

But when I do, for example, SELECT to_tsvector('english_ispell',
'gitar') the result is only:
'gitar':1

Shouldn't the word be corrected to 'guitar'?

SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither:
'gitar'

Thanks,
Corin

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Corin (#1)
Re: fulltext search stemming/ spelling problems

On Thu, 8 Apr 2010, Corin wrote:

Hi!

I'm using postgres 8.4.3 and try to get stemming/ wrong word correction
working.

I already installed the myspell dictionaries using apt-get and created
postgres dictionaries like this:

Fulltext search configuration ?public.english_ispell?
Parser: ?pg_catalog.default?
Token | Dictionaries
-----------------+------------------------------------
asciihword | english_ispell,english_stem,simple
asciiword | english_ispell,english_stem,simple
email | simple
file | simple
float | simple
host | simple
hword | english_ispell,english_stem,simple
hword_asciipart | english_ispell,english_stem,simple
hword_numpart | simple
hword_part | english_ispell,english_stem,simple
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_ispell,english_stem,simple

But when I do, for example, SELECT to_tsvector('english_ispell', 'gitar') the
result is only:
'gitar':1

Shouldn't the word be corrected to 'guitar'?

english_ispell dictionary is a morphology kind of dictionary ! Read docs.
Also, simple dictionary will never invoked, since english_stem dictionary
recognizes everything !

SELECT plainto_tsquery('english_ispell','gitar') doesn't work neither:
'gitar'

Better, use ts_debug() function or ts_dict() for testing.

Thanks,
Corin

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Corin
wakathane@gmail.com
In reply to: Oleg Bartunov (#2)
Re: fulltext search stemming/ spelling problems

On 08.04.2010 20:15, Oleg Bartunov wrote:

On Thu, 8 Apr 2010, Corin wrote:

english_ispell dictionary is a morphology kind of dictionary ! Read docs.
Also, simple dictionary will never invoked, since english_stem dictionary
recognizes everything !

I'm not sure what you mean with 'morphology'. I sure read the docs but
couldn't find anything about 'morphology disctionaries'.

I created it myself with the following commands, after I installed the
ispell dictionaries using "apt-get":

CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = system_en_us,
AffFile = system_en_us
);

CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY =
pg_catalog.english );
ALTER TEXT SEARCH CONFIGURATION english_ispell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword,
hword_part WITH english_ispell, english_stem;

Thank's for the hint with simple dictionary. I'll remove it - but when
it's never triggered, I gues it won't solve my problem neither?

Better, use ts_debug() function or ts_dict() for testing.

ts_debug shows:
SELECT ts_debug('english_ispell','gitar');
(asciiword,"Word, all
ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar})
(1 line)

ts_dict does not seem to exist, I neither couldn't find it in the docs.

Regards,
Oleg

Thanks,
Corin

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Corin (#3)
Re: fulltext search stemming/ spelling problems

On Thu, 8 Apr 2010, Corin wrote:

On 08.04.2010 20:15, Oleg Bartunov wrote:

On Thu, 8 Apr 2010, Corin wrote:

english_ispell dictionary is a morphology kind of dictionary ! Read docs.
Also, simple dictionary will never invoked, since english_stem dictionary
recognizes everything !

I'm not sure what you mean with 'morphology'. I sure read the docs but
couldn't find anything about 'morphology disctionaries'.

it means, that (from http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY)

12.6.5. Ispell Dictionary

The Ispell dictionary template supports morphological dictionaries, which can normalize many different linguistic forms of a word into the same lexeme. For example, an English Ispell dictionary can match all declensions and conjugations of the search term bank, e.g., banking, banked, banks, banks', and bank's.

you confused with the name !

I created it myself with the following commands, after I installed the ispell
dictionaries using "apt-get":

CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = ispell,
DictFile = system_en_us,
AffFile = system_en_us
);

CREATE TEXT SEARCH CONFIGURATION english_ispell ( COPY = pg_catalog.english
);
ALTER TEXT SEARCH CONFIGURATION english_ispell
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword,
hword_part WITH english_ispell, english_stem;

Thank's for the hint with simple dictionary. I'll remove it - but when it's
never triggered, I gues it won't solve my problem neither?

Better, use ts_debug() function or ts_dict() for testing.

ts_debug shows:
SELECT ts_debug('english_ispell','gitar');
(asciiword,"Word, all
ASCII",gitar,"{english_ispell,english_stem}",english_stem,{gitar})
(1 line)

ts_dict does not seem to exist, I neither couldn't find it in the docs.

sorry, ts_lexize

Regards,
Oleg

Thanks,
Corin

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#5Corin
wakathane@gmail.com
In reply to: Oleg Bartunov (#4)
Re: fulltext search stemming/ spelling problems

On 08.04.2010 21:27, Oleg Bartunov wrote:

it means, that (from
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY)

12.6.5. Ispell Dictionary

The Ispell dictionary template supports morphological dictionaries,
which can normalize many different linguistic forms of a word into the
same lexeme. For example, an English Ispell dictionary can match all
declensions and conjugations of the search term bank, e.g., banking,
banked, banks, banks', and bank's.

I already read this but I don't know how to solve my problems with this
information.

SELECT ts_lexize('english_ispell','guitar');
{guitar}
(1 line)

SELECT ts_lexize('english_ispell','bank');
{bank}
(1 line)

SELECT ts_debug('english_ispell','bank');
(asciiword,"Word, all
ASCII",bank,"{english_ispell,english_stem}",english_ispell,{bank})
(1 line)

SELECT plainto_tsquery('english_ispell','bank');
'bank'
(1 line)

Regards,
Oleg

It would be very nice if you (or anyone else) could provide me with
concrete instructions or any howto. What can I do to find the error in
my setup? What output should I expect from the above comments if
everything worked correctly?

Thanks,
Corin